期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 总被引：1，自引：0，他引：1

Michael I Love Wolfgang Huber Simon Anders 《Genome biology》2014,15(12)

In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users. 相似文献

2.

Transcriptome dynamics of a desert poplar (Populus pruinosa) in response to continuous salinity stress

Jian Zhang Dechun Jiang Bingbing Liu Wenchun Luo Jing Lu Tao Ma Dongshi Wan 《Plant cell reports》2014,33(9):1565-1579

相似文献

3.

Skeletal muscle alterations and exercise performance decrease in erythropoietin-deficient mice: a comparative study

Laurence Mille-Hamard Veronique L Billat Elodie Henry Blandine Bonnamy Florence Joly Philippe Benech Eric Barrey 《BMC medical genomics》2012,5(1):1-20

相似文献

4.

Identification of gene signatures from RNA-seq data using Pareto-optimal cluster algorithm

Saurav Mallik Zhongming Zhao 《BMC systems biology》2018,12(8):126

Background

Gene signatures are important to represent the molecular changes in the disease genomes or the cells in specific conditions, and have been often used to separate samples into different groups for better research or clinical treatment. While many methods and applications have been available in literature, there still lack powerful ones that can take account of the complex data and detect the most informative signatures.

Methods

In this article, we present a new framework for identifying gene signatures using Pareto-optimal cluster size identification for RNA-seq data. We first performed pre-filtering steps and normalization, then utilized the empirical Bayes test in Limma package to identify the differentially expressed genes (DEGs). Next, we used a multi-objective optimization technique, “Multi-objective optimization for collecting cluster alternatives” (MOCCA in R package) on these DEGs to find Pareto-optimal cluster size, and then applied k-means clustering to the RNA-seq data based on the optimal cluster size. The best cluster was obtained through computing the average Spearman’s Correlation Score among all the genes in pair-wise manner belonging to the module. The best cluster is treated as the signature for the respective disease or cellular condition.

Results

We applied our framework to a cervical cancer RNA-seq dataset, which included 253 squamous cell carcinoma (SCC) samples and 22 adenocarcinoma (ADENO) samples. We identified a total of 582 DEGs by Limma analysis of SCC versus ADENO samples. Among them, 260 are up-regulated genes and 322 are down-regulated genes. Using MOCCA, we obtained seven Pareto-optimal clusters. The best cluster has a total of 35 DEGs consisting of all-upregulated genes. For validation, we ran PAMR (prediction analysis for microarrays) classifier on the selected best cluster, and assessed the classification performance. Our evaluation, measured by sensitivity, specificity, precision, and accuracy, showed high confidence.

Conclusions

Our framework identified a multi-objective based cluster that is treated as a signature that can classify the disease and control group of samples with higher classification performance (accuracy 0.935) for the corresponding disease. Our method is useful to find signature for any RNA-seq or microarray data.

相似文献

5.

Model-specific tests on variance heterogeneity for detection of potentially interacting genetic loci

Hothorn Ludwig A Libiger Ondrej Gerhard Daniel 《BMC genetics》2012,13(1):1-6

相似文献

6.

NormalizeMets: assessing,selecting and implementing statistical methods for normalizing metabolomics data

Alysha M. De Livera Gavriel Olshansky Julie A. Simpson Darren J. Creek 《Metabolomics : Official journal of the Metabolomic Society》2018,14(5):54

Introduction

In metabolomics studies, unwanted variation inevitably arises from various sources. Normalization, that is the removal of unwanted variation, is an essential step in the statistical analysis of metabolomics data. However, metabolomics normalization is often considered an imprecise science due to the diverse sources of variation and the availability of a number of alternative strategies that may be implemented.

Objectives

We highlight the need for comparative evaluation of different normalization methods and present software strategies to help ease this task for both data-oriented and biological researchers.

Methods

We present NormalizeMets—a joint graphical user interface within the familiar Microsoft Excel and freely-available R software for comparative evaluation of different normalization methods. The NormalizeMets R package along with the vignette describing the workflow can be downloaded from https://cran.r-project.org/web/packages/NormalizeMets/. The Excel Interface and the Excel user guide are available on https://metabolomicstats.github.io/ExNormalizeMets.

Results

NormalizeMets allows for comparative evaluation of normalization methods using criteria that depend on the given dataset and the ultimate research question. Hence it guides researchers to assess, select and implement a suitable normalization method using either the familiar Microsoft Excel and/or freely-available R software. In addition, the package can be used for visualisation of metabolomics data using interactive graphical displays and to obtain end statistical results for clustering, classification, biomarker identification adjusting for confounding variables, and correlation analysis.

Conclusion

NormalizeMets is designed for comparative evaluation of normalization methods, and can also be used to obtain end statistical results. The use of freely-available R software offers an attractive proposition for programming-oriented researchers, and the Excel interface offers a familiar alternative to most biological researchers. The package handles the data locally in the user’s own computer allowing for reproducible code to be stored locally.

相似文献

7.

Lipoteichoic acid is an important microbe-associated molecular pattern of Lactobacillus rhamnosus GG

Claes Ingmar JJ Segers Marijke E Verhoeven Tine LA Dusselier Michiel Sels Bert F De Keersmaecker Sigrid CJ Vanderleyden Jos Lebeer Sarah 《Microbial cell factories》2012,11(1):1-8

Background

Receptors with a single transmembrane (TM) domain are essential for the signal transduction across the cell membrane. NMR spectroscopy is a powerful tool to study structure of the single TM domain. The expression and purification of a TM domain in Escherichia coli (E.coli) is challenging due to its small molecular weight. Although ketosteroid isomerase (KSI) is a commonly used affinity tag for expression and purification of short peptides, KSI tag needs to be removed with the toxic reagent cyanogen bromide (CNBr).

Result

The purification of the TM domain of p75 neurotrophin receptor using a KSI tag with the introduction of a thrombin cleavage site is described herein. The recombinant fusion protein was refolded into micelles and was cleaved with thrombin. Studies showed that purified protein could be used for structural study using NMR spectroscopy.

Conclusions

These results provide another strategy for obtaining a single TM domain for structural studies without using toxic chemical digestion or acid to remove the fusion tag. The purified TM domain of p75 neurotrophin receptor will be useful for structural studies. 相似文献

8.

PureCN: copy number calling and SNV classification using targeted short read sequencing

Markus?Riester Email author View author&#;s OrcID profile Angad?P.?Singh A.?Rose?Brannon Kun?Yu Catarina?D.?Campbell Derek?Y.?Chiang Michael?P.?Morrissey 《Source code for biology and medicine》2016,11(1):13

Background

Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germline and somatic variants, which can be exploited to accurately infer genotypes after adjusting for local copy number. However, existing algorithms for determining tumor purity, ploidy and copy number are not designed for unmatched short read sequencing data.

Results

We describe a methodology and corresponding open source software for estimating tumor purity, copy number, loss of heterozygosity (LOH), and contamination, and for classification of single nucleotide variants (SNVs) by somatic status and clonality. This R package, PureCN, is optimized for targeted short read sequencing data, integrates well with standard somatic variant detection pipelines, and has support for matched and unmatched tumor samples. Accuracy is demonstrated on simulated data and on real whole exome sequencing data.

Conclusions

Our algorithm provides accurate estimates of tumor purity and ploidy, even if matched normal samples are not available. This in turn allows accurate classification of SNVs. The software is provided as open source (Artistic License 2.0) R/Bioconductor package PureCN (http://bioconductor.org/packages/PureCN/).

相似文献

9.

Global transcriptome analysis of <Emphasis Type="Italic">Clostridium thermocellum</Emphasis> ATCC 27405 during growth on dilute acid pretreated <Emphasis Type="Italic">Populus</Emphasis> and switchgrass

Charlotte?M?Wilson Miguel?RodriguezJr Courtney?M?Johnson Stanton?L?Martin Tzu?Ming?Chu Russ?D?Wolfinger Loren?J?Hauser Miriam?L?Land Dawn?M?Klingeman Mustafa?H?Syed Arthur?J?Ragauskas Timothy?J?Tschaplinski Jonathan?R?Mielenz Steven?D?Brown Email author 《Biotechnology for biofuels》2013,6(1):179

相似文献

10.

De novo transcriptome analysis reveals tissue-specific differences in gene expression in <Emphasis Type="Italic">Salix arbutifolia</Emphasis>

Guodong Rao Yanfei Zeng Jinkai Sui Jianguo Zhang 《Trees - Structure and Function》2016,30(5):1647-1655

相似文献

11.

Profile and Time-Scale Dynamics of Differentially Expressed Genes in Transcriptome of <Emphasis Type="Italic">Populus davidiana</Emphasis> Under Drought Stress

Bong-Gyu Mun Adil Hussain Eung-Jun Park Sang-Uk Lee Arti Sharma Qari Muhammad Imran Ki-Hong Jung Byung-Wook Yun 《Plant Molecular Biology Reporter》2017,35(6):647-660

相似文献

12.

De novo assembly and Characterisation of the Transcriptome during seed development, and generation of genic-SSR markers in Peanut (Arachis hypogaea L.)

Jianan Zhang Shan Liang Jialei Duan Jin Wang Silong Chen Zengshu Cheng Qiang Zhang Xuanqiang Liang Yurong Li 《BMC genomics》2012,13(1):1-6

相似文献

13.

Genome-wide marker development for the wheat D genome based on single nucleotide polymorphisms identified from transcripts in the wild wheat progenitor Aegilops tauschii 总被引：1，自引：0，他引：1

Julio Cesar Masaru Iehisa Akifumi Shimizu Kazuhiro Sato Ryo Nishijima Kouhei Sakaguchi Ryusuke Matsuda Shuhei Nasuda Shigeo Takumi 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2014,127(2):261-271

相似文献

14.

Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data

Nicolas J Parker Andrew G Parker 《Source code for biology and medicine》2008,3(1):1-10

Background

The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches.

Methods

A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads.

Results

Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension.

Conclusion

The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information. 相似文献

15.

PedGenie: meta genetic association testing in mixed family and case-control designs

Karen Curtin Jathine Wong Kristina Allen-Brady Nicola J Camp 《BMC bioinformatics》2007,8(1):1-8

Background

With the increased availability of high throughput data, such as DNA microarray data, researchers are capable of producing large amounts of biological data. During the analysis of such data often there is the need to further explore the similarity of genes not only with respect to their expression, but also with respect to their functional annotation which can be obtained from Gene Ontology (GO).

Results

We present the freely available software package GOSim, which allows to calculate the functional similarity of genes based on various information theoretic similarity concepts for GO terms. GOSim extends existing tools by providing additional lately developed functional similarity measures for genes. These can e.g. be used to cluster genes according to their biological function. Vice versa, they can also be used to evaluate the homogeneity of a given grouping of genes with respect to their GO annotation. GOSim hence provides the researcher with a flexible and powerful tool to combine knowledge stored in GO with experimental data. It can be seen as complementary to other tools that, for instance, search for significantly overrepresented GO terms within a given group of genes.

Conclusion

GOSim is implemented as a package for the statistical computing environment R and is distributed under GPL within the CRAN project. 相似文献

16.

A bedr way of genomic interval processing

Syed?Haider Daryl?Waggott Emilie?Lalonde Clement?Fung Fei-Fei?Liu Paul?C.?Boutros Email author 《Source code for biology and medicine》2016,11(1):14

Background

Next-generation sequencing is making it critical to robustly and rapidly handle genomic ranges within standard pipelines. Standard use-cases include annotating sequence ranges with gene or other genomic annotation, merging multiple experiments together and subsequently quantifying and visualizing the overlap. The most widely-used tools for these tasks work at the command-line (e.g. BEDTools) and the small number of available R packages are either slow or have distinct semantics and features from command-line interfaces.

Results

To provide a robust R-based interface to standard command-line tools for genomic coordinate manipulation, we created bedr. This open-source R package can use either BEDTools or BEDOPS as a back-end and performs data-manipulation extremely quickly, creating R data structures that can be readily interfaced with existing computational pipelines. It includes data-visualization capabilities and a number of data-access functions that interface with standard databases like UCSC and COSMIC.

Conclusions

bedr package provides an open source solution to enable genomic interval data manipulation and restructuring in R programming language which is commonly used in bioinformatics, and therefore would be useful to bioinformaticians and genomic researchers.

相似文献

17.

Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data

Ágnes Baross Allen D Delaney H Irene Li Tarun Nayar Stephane Flibotte Hong Qian Susanna Y Chan Jennifer Asano Adrian Ally Manqiu Cao Patricia Birch Mabel Brown-John Nicole Fernandes Anne Go Giulia Kennedy Sylvie Langlois Patrice Eydoux JM Friedman Marco A Marra 《BMC bioinformatics》2007,8(1):1-18

Background

Genomic deletions and duplications are important in the pathogenesis of diseases, such as cancer and mental retardation, and have recently been shown to occur frequently in unaffected individuals as polymorphisms. Affymetrix GeneChip whole genome sampling analysis (WGSA) combined with 100 K single nucleotide polymorphism (SNP) genotyping arrays is one of several microarray-based approaches that are now being used to detect such structural genomic changes. The popularity of this technology and its associated open source data format have resulted in the development of an increasing number of software packages for the analysis of copy number changes using these SNP arrays.

Results

We evaluated four publicly available software packages for high throughput copy number analysis using synthetic and empirical 100 K SNP array data sets, the latter obtained from 107 mental retardation (MR) patients and their unaffected parents and siblings. We evaluated the software with regards to overall suitability for high-throughput 100 K SNP array data analysis, as well as effectiveness of normalization, scaling with various reference sets and feature extraction, as well as true and false positive rates of genomic copy number variant (CNV) detection.

Conclusion

We observed considerable variation among the numbers and types of candidate CNVs detected by different analysis approaches, and found that multiple programs were needed to find all real aberrations in our test set. The frequency of false positive deletions was substantial, but could be greatly reduced by using the SNP genotype information to confirm loss of heterozygosity. 相似文献

18.

Potential molecular characteristics in situ in response to repetitive UVB irradiation

Wenqi?Chen Email author Jinhai?Zhang 《Diagnostic pathology》2016,11(1):129

相似文献

19.

Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq

Anto P. Rajkumar Per Qvist Ross Lazarus Francesco Lescai Jia Ju Mette Nyegaard Ole Mors Anders D. B?rglum Qibin Li Jane H. Christensen 《BMC genomics》2015,16(1)

Background

Massively parallel cDNA sequencing (RNA-seq) experiments are gradually superseding microarrays in quantitative gene expression profiling. However, many biologists are uncertain about the choice of differentially expressed gene (DEG) analysis methods and the validity of cost-saving sample pooling strategies for their RNA-seq experiments. Hence, we performed experimental validation of DEGs identified by Cuffdiff2, edgeR, DESeq2 and Two-stage Poisson Model (TSPM) in a RNA-seq experiment involving mice amygdalae micro-punches, using high-throughput qPCR on independent biological replicate samples. Moreover, we sequenced RNA-pools and compared their results with sequencing corresponding individual RNA samples.

Results

False-positivity rate of Cuffdiff2 and false-negativity rates of DESeq2 and TSPM were high. Among the four investigated DEG analysis methods, sensitivity and specificity of edgeR was relatively high. We documented the pooling bias and that the DEGs identified in pooled samples suffered low positive predictive values.

Conclusions

Our results highlighted the need for combined use of more sensitive DEG analysis methods and high-throughput validation of identified DEGs in future RNA-seq experiments. They indicated limited utility of sample pooling strategies for RNA-seq in similar setups and supported increasing the number of biological replicate samples.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1767-y) contains supplementary material, which is available to authorized users. 相似文献

20.

SRAdb: query and use public next-generation sequencing data from within R

Yuelin Zhu Robert M Stephens Paul S Meltzer Sean R Davis 《BMC bioinformatics》2013,14(1):1-4

Background

The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others.

Results

SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer.

Conclusions

SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R. 相似文献