期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Identification of fusion genes in breast cancer by paired-end RNA-sequencing

Edgren H Murumagi A Kangaspeska S Nicorici D Hongisto V Kleivi K Rye IH Nyberg S Wolf M Borresen-Dale AL Kallioniemi O 《Genome biology》2011,12(1):R6-13

相似文献

2.

Annotating gene function by combining expression data with a modular gene network

Shiga M Takigawa I Mamitsuka H 《Bioinformatics (Oxford, England)》2007,23(13):i468-i478

MOTIVATION: A promising and reliable approach to annotate gene function is clustering genes not only by using gene expression data but also literature information, especially gene networks. RESULTS: We present a systematic method for gene clustering by combining these totally different two types of data, particularly focusing on network modularity, a global feature of gene networks. Our method is based on learning a probabilistic model, which we call a hidden modular random field in which the relation between hidden variables directly represents a given gene network. Our learning algorithm which minimizes an energy function considering the network modularity is practically time-efficient, regardless of using the global network property. We evaluated our method by using a metabolic network and microarray expression data, changing with microarray datasets, parameters of our model and gold standard clusters. Experimental results showed that our method outperformed other four competing methods, including k-means and existing graph partitioning methods, being statistically significant in all cases. Further detailed analysis showed that our method could group a set of genes into a cluster which corresponds to the folate metabolic pathway while other methods could not. From these results, we can say that our method is highly effective for gene clustering and annotating gene function. 相似文献

3.

RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries

Habegger L Sboner A Gianoulis TA Rozowsky J Agarwal A Snyder M Gerstein M 《Bioinformatics (Oxford, England)》2011,27(2):281-283

SUMMARY: The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. Availability and implementation: RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/. 相似文献

4.

A general modular framework for gene set enrichment analysis

Marit Ackermann Korbinian Strimmer 《BMC bioinformatics》2009,10(1):47-20

Background

Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear. 相似文献

5.

Open-access synthetic spike-in mRNA-seq data for cancer gene fusions

Waibhav D Tembe Stephanie JK Pond Christophe Legendre Han-Yu Chuang Winnie S Liang Nancy E Kim Valerie Montel Shukmei Wong Timothy K McDaniel David W Craig John D Carpten 《BMC genomics》2014,15(1)

相似文献

6.

A classification-based framework for predicting and analyzing gene regulatory response

Kundaje A Middendorf M Shah M Wiggins CH Freund Y Leslie C 《BMC bioinformatics》2006,7(Z1):S5

相似文献

7.

R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data

Mittal VK McDonald JF 《Nucleic acids research》2012,40(9):e67

相似文献

8.

A general framework for analyzing data from two short time-series microarray experiments

Shah M Corbeil J 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(1):14-26

We propose a general theoretical framework for analyzing differentially expressed genes and behavior patterns from two homogenous short time-course data. The framework generalizes the recently proposed Hilbert-Schmidt Independence Criterion (HSIC)-based framework adapting it to the time-series scenario by utilizing tensor analysis for data transformation. The proposed framework is effective in yielding criteria that can identify both the differentially expressed genes and time-course patterns of interest between two time-series experiments without requiring to explicitly cluster the data. The results, obtained by applying the proposed framework with a linear kernel formulation, on various data sets are found to be both biologically meaningful and consistent with published studies. 相似文献

9.

Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing

Kenneth A. Watanabe Arielle Homayouni Tara Tufano Jennifer Lopez Patricia Ringler Paul Rushton Qingxi J. Shen 《DNA research》2015,22(5):319-329

相似文献

10.

Dimension reduction strategies for analyzing global gene expression data with a response 总被引：5，自引：0，他引：5

Chiaromonte F Martinelli J 《Mathematical biosciences》2002,176(1):123-144

The analysis of global gene expression data from microarrays is breaking new ground in genetics research, while confronting modelers and statisticians with many critical issues. In this paper, we consider data sets in which a categorical or continuous response is recorded, along with gene expression, on a given number of experimental samples. Data of this type are usually employed to create a prediction mechanism for the response based on gene expression, and to identify a subset of relevant genes. This defines a regression setting characterized by a dramatic under-resolution with respect to the predictors (genes), whose number exceeds by orders of magnitude the number of available observations (samples). We present a dimension reduction strategy that, under appropriate assumptions, allows us to restrict attention to a few linear combinations of the original expression profiles, and thus to overcome under-resolution. These linear combinations can then be used to build and validate a regression model with standard techniques. Moreover, they can be used to rank original predictors, and ultimately to select a subset of them through comparison with a background 'chance scenario' based on a number of independent randomizations. We apply this strategy to publicly available data on leukemia classification. 相似文献

11.

CLUM: a cluster program for analyzing microarray data

Irigoien I Fernandez E Vives S Arenas C 《Genetika》2008,44(8):1137-1140

Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems. Cluster analysis has proven to be a very useful tool for investigating the structure of microarray data. This paper presents a program for clustering microarray data, which is based on the so call path-distance. The algorithm gives in each step a partition in two clusters and no prior assumptions on the structure of clusters are required. It assigns each object (gene or sample) to only one cluster and gives the global optimum for the function that quantifies the adequacy of a given partition of the sample into k clusters. The program was tested on experimental data sets, showing the robustness of the algorithm. 相似文献

12.

A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data

Jiang X Weitz JS Dushoff J 《Journal of mathematical biology》2012,64(4):697-711

Metagenomic studies sequence DNA directly from environmental samples to explore the structure and function of complex microbial and viral communities. Individual, short pieces of sequenced DNA (“reads”) are classified into (putative) taxonomic or metabolic groups which are analyzed for patterns across samples. Analysis of such read matrices is at the core of using metagenomic data to make inferences about ecosystem structure and function. Non-negative matrix factorization (NMF) is a numerical technique for approximating high-dimensional data points as positive linear combinations of positive components. It is thus well suited to interpretation of observed samples as combinations of different components. We develop, test and apply an NMF-based framework to analyze metagenomic read matrices. In particular, we introduce a method for choosing NMF degree in the presence of overlap, and apply spectral-reordering techniques to NMF-based similarity matrices to aid visualization. We show that our method can robustly identify the appropriate degree and disentangle overlapping contributions using synthetic data sets. We then examine and discuss the NMF decomposition of a metabolic profile matrix extracted from 39 publicly available metagenomic samples, and identify canonical sample types, including one associated with coral ecosystems, one associated with highly saline ecosystems and others. We also identify specific associations between pathways and canonical environments, and explore how alternative choices of decompositions facilitate analysis of read matrices at a finer scale. 相似文献

13.

SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data

Wenlong Jia Kunlong Qiu Minghui He Pengfei Song Quan Zhou Feng Zhou Yuan Yu Dandan Zhu Michael L Nickerson Shengqing Wan Xiangke Liao Xiaoqian Zhu Shaoliang Peng Yingrui Li Jun Wang Guangwu Guo 《Genome biology》2013,14(2):R12

相似文献

14.

A framework for analyzing DNA methylation data from Illumina Infinium HumanMethylation450 BeadChip

Zhenxing Wang XiaoLiang Wu Yadong Wang 《BMC bioinformatics》2018,19(5):115

Background

DNA methylation has been identified to be widely associated to complex diseases. Among biological platforms to profile DNA methylation in human, the Illumina Infinium HumanMethylation450 BeadChip (450K) has been accepted as one of the most efficient technologies. However, challenges exist in analysis of DNA methylation data generated by this technology due to widespread biases.

Results

Here we proposed a generalized framework for evaluating data analysis methods for Illumina 450K array. This framework considers the following steps towards a successful analysis: importing data, quality control, within-array normalization, correcting type bias, detecting differentially methylated probes or regions and biological interpretation.

Conclusions

We evaluated five methods using three real datasets, and proposed outperform methods for the Illumina 450K array data analysis. Minfi and methylumi are optimal choice when analyzing small dataset. BMIQ and RCP are proper to correcting type bias and the normalized result of them can be used to discover DMPs. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation.

相似文献

15.

A general framework for biclustering gene expression data

Li H Chen X Zhang K Jiang T 《Journal of bioinformatics and computational biology》2006,4(4):911-933

A large number of biclustering methods have been proposed to detect patterns in gene expression data. All these methods try to find some type of biclusters but no one can discover all the types of patterns in the data. Furthermore, researchers have to design new algorithms in order to find new types of biclusters/patterns that interest biologists. In this paper, we propose a novel approach for biclustering that, in general, can be used to discover all computable patterns in gene expression data. The method is based on the theory of Kolmogorov complexity. More precisely, we use Kolmogorov complexity to measure the randomness of submatrices as the merit of biclusters because randomness naturally consists in a lack of regularity, which is a common property of all types of patterns. On the basis of algorithmic probability measure, we develop a Markov Chain Monte Carlo algorithm to search for biclusters. Our method can also be easily extended to solve the problems of conventional clustering and checkerboard type biclustering. The preliminary experiments on simulated as well as real data show that our approach is very versatile and promising. 相似文献

16.

ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data

Friedman BA Maniatis T 《Genome biology》2011,12(7):R69

RNA-Seq and microarray platforms have emerged as important tools for detecting changes in gene expression and RNA processing in biological samples. We present ExpressionPlot, a software package consisting of a default back end, which prepares raw sequencing or Affymetrix microarray data, and a web-based front end, which offers a biologically centered interface to browse, visualize, and compare different data sets. Download and installation instructions, a user's manual, discussion group, and a prototype are available at . 相似文献

17.

Biana: a software framework for compiling biological interactions and analyzing networks

Javier Garcia-Garcia Emre Guney Ramon Aragues Joan Planas-Iglesias Baldo Oliva 《BMC bioinformatics》2010,11(1):56

Background

The analysis and usage of biological data is hindered by the spread of information across multiple repositories and the difficulties posed by different nomenclature systems and storage formats. In particular, there is an important need for data unification in the study and use of protein-protein interactions. Without good integration strategies, it is difficult to analyze the whole set of available data and its properties. 相似文献

18.

PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data

Zhang Y Lameijer EW 't Hoen PA Ning Z Slagboom PE Ye K 《Bioinformatics (Oxford, England)》2012,28(4):479-486

相似文献

19.

Analysis of cellulose synthase gene expression strategies in higher plants using RNA-sequencing data

Ts. A. Padvitski D. V. Galinousky N. V. Anisimova G. Ya. Baer Ya. V. Pirko A. I. Yemets L. V. Khotyleva Ya. B. Blume A. V. Kilchevsky 《Cytology and Genetics》2017,51(1):8-17

相似文献

20.

Delta plots: a tool for analyzing phylogenetic distance data

Holland BR Huber KT Dress A Moulton V 《Molecular biology and evolution》2002,19(12):2051-2059

A method is described that allows the assessment of treelikeness of phylogenetic distance data before tree estimation. This method is related to statistical geometry as introduced by Eigen, Winkler-Oswatitsch, and Dress (1988 [Proc. Natl. Acad. Sci. USA. 85:5913-5917]), and in essence, displays a measure for treelikeness of quartets in terms of a histogram that we call a delta plot. This allows identification of nontreelike data and analysis of noisy data sets arising from processes such as, for example, parallel evolution, recombination, or lateral gene transfer. In addition to an overall assessment of treelikeness, individual taxa can be ranked by reference to the treelikeness of the quartets to which they belong. Removal of taxa on the basis of this ranking results in an increase in accuracy of tree estimation. Recombinant data sets are simulated, and the method is shown to be capable of identifying single recombinant taxa on the basis of distance information alone, provided the parents of the recombinant sequence are sufficiently divergent and the mixture of tree histories is not strongly skewed toward a single tree. delta Plots and taxon rankings are applied to three biological data sets using distances derived from sequence alignment, gene order, and fragment length polymorphism. 相似文献