共查询到20条相似文献,搜索用时 359 毫秒
1.
Minchao Wang Wu Zhang Wang Ding Dongbo Dai Huiran Zhang Hao Xie Luonan Chen Yike Guo Jiang Xie 《PloS one》2014,9(4)
Backgrounds
Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.Methods
Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.Result
A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. 相似文献2.
Background
Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented.Methodology/Principal Findings
We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples.Conclusion/Significance
We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large-scale datasets.Availability
Software and data sets are available at http://www.sfu.ca/~ester/software/DECOB.zip. 相似文献3.
4.
5.
6.
7.
8.
9.
Background
De novo genome assembly of next-generation sequencing data is one of the most important current problems in bioinformatics, essential in many biological applications. In spite of significant amount of work in this area, better solutions are still very much needed.Results
We present a new program, SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.Conclusions
SAGE benefits from innovations in almost every aspect of the assembly process: error correction of input reads, string-overlap graph construction, read copy counts estimation, overlap graph analysis and reduction, contig extraction, and scaffolding. We hope that these new ideas will help advance the current state-of-the-art in an essential area of research in genomics.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-302) contains supplementary material, which is available to authorized users. 相似文献10.
Background
Gene expression profiling using microarrays is a powerful technology widely used to study regulatory networks. Profiling of mRNA levels in mutant organisms has the potential to identify genes regulated by the mutated protein.Methodology/Principle Findings
Using tissues from multiple lines of knockout mice we have examined genome-wide changes in gene expression. We report that a significant proportion of changed genes were found near the targeted gene.Conclusions/Significance
The apparent clustering of these genes was explained by the presence of flanking DNA from the parental ES cell. We provide recommendations for the analysis and reporting of microarray data from knockout mice 相似文献11.
12.
Background
Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies.Principal Findings
We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly.Conclusions
These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler. 相似文献13.
14.
Zinn PO Mahajan B Majadan B Sathyan P Singh SK Majumder S Jolesz FA Colen RR 《PloS one》2011,6(10):e25451
Background
Despite recent discoveries of new molecular targets and pathways, the search for an effective therapy for Glioblastoma Multiforme (GBM) continues. A newly emerged field, radiogenomics, links gene expression profiles with MRI phenotypes. MRI-FLAIR is a noninvasive diagnostic modality and was previously found to correlate with cellular invasion in GBM. Thus, our radiogenomic screen has the potential to reveal novel molecular determinants of invasion. Here, we present the first comprehensive radiogenomic analysis using quantitative MRI volumetrics and large-scale gene- and microRNA expression profiling in GBM.Methods
Based on The Cancer Genome Atlas (TCGA), discovery and validation sets with gene, microRNA, and quantitative MR-imaging data were created. Top concordant genes and microRNAs correlated with high FLAIR volumes from both sets were further characterized by Kaplan Meier survival statistics, microRNA-gene correlation analyses, and GBM molecular subtype-specific distribution.Results
The top upregulated gene in both the discovery (4 fold) and validation (11 fold) sets was PERIOSTIN (POSTN). The top downregulated microRNA in both sets was miR-219, which is predicted to bind to POSTN. Kaplan Meier analysis demonstrated that above median expression of POSTN resulted in significantly decreased survival and shorter time to disease progression (P<0.001). High POSTN and low miR-219 expression were significantly associated with the mesenchymal GBM subtype (P<0.0001).Conclusion
Here, we propose a novel diagnostic method to screen for molecular cancer subtypes and genomic correlates of cellular invasion. Our findings also have potential therapeutic significance since successful molecular inhibition of invasion will improve therapy and patient survival in GBM. 相似文献15.
16.
17.
Background
Previous studies using hierarchical clustering approach to analyze resting-state fMRI data were limited to a few slices or regions-of-interest (ROIs) after substantial data reduction.Purpose
To develop a framework that can perform voxel-wise hierarchical clustering of whole-brain resting-state fMRI data from a group of subjects.Materials and Methods
Resting-state fMRI measurements were conducted for 86 adult subjects using a single-shot echo-planar imaging (EPI) technique. After pre-processing and co-registration to a standard template, pair-wise cross-correlation coefficients (CC) were calculated for all voxels inside the brain and translated into absolute Pearson''s distances after imposing a threshold CC≥0.3. The group averages of the Pearson''s distances were then used to perform hierarchical clustering with the developed framework, which entails gray matter masking and an iterative scheme to analyze the dendrogram.Results
With the hierarchical clustering framework, we identified most of the functional connectivity networks reported previously in the literature, such as the motor, sensory, visual, memory, and the default-mode functional networks (DMN). Furthermore, the DMN and visual system were split into their corresponding hierarchical sub-networks.Conclusion
It is feasible to use the proposed hierarchical clustering scheme for voxel-wise analysis of whole-brain resting-state fMRI data. The hierarchical clustering result not only confirmed generally the finding in functional connectivity networks identified previously using other data processing techniques, such as ICA, but also revealed directly the hierarchical structure within the functional connectivity networks. 相似文献18.
19.
Background
Recently, CD4+IL-17A+ T helper 17 (Th17) cells were identified and reported in several diseased states, including autoimmunity, infection and various peripheral nervous system tumors. However, the presence of Th17 in glia-derived tumors of the central nervous system has not been studied.Methodology/Principal Findings
In this report, we demonstrate that mRNA expression for the Th17 cell cytokine IL-17A, as well as Th17 cells, are present in human glioma. The mRNA expression for IL-17A in glioma was recapitulated in an immunocompetent mouse model of malignant glioma. Furthermore, the presence of Th17 cells was confirmed in both human and mouse glioma. Interestingly, some Th17 cells present in mouse glioma co-expressed the Th1 and Th2 lineage markers, IFN-γ and IL-4, respectively, but predominantly co-expressed the Treg lineage marker FoxP3.Conclusions
These data confirm the presence of Th17 cells in glia-derived CNS tumors and provide the rationale for further investigation into the role of Th17 cells in malignant glioma. 相似文献20.
Xiaopei Shen Shan Li Lin Zhang Hongdong Li Guini Hong XianXiao Zhou Tingting Zheng Wenjing Zhang Chunxiang Hao Tongwei Shi Chunyang Liu Zheng Guo 《PloS one》2013,8(4)