期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Dispersion Estimation and Its Effect on Test Performance in RNA-seq Data Analysis: A Simulation-Based Comparison of Methods

William Michael Landau Peng Liu 《PloS one》2013,8(12)

A central goal of RNA sequencing (RNA-seq) experiments is to detect differentially expressed genes. In the ubiquitous negative binomial model for RNA-seq data, each gene is given a dispersion parameter, and correctly estimating these dispersion parameters is vital to detecting differential expression. Since the dispersions control the variances of the gene counts, underestimation may lead to false discovery, while overestimation may lower the rate of true detection. After briefly reviewing several popular dispersion estimation methods, this article describes a simulation study that compares them in terms of point estimation and the effect on the performance of tests for differential expression. The methods that maximize the test performance are the ones that use a moderate degree of dispersion shrinkage: the DSS, Tagwise wqCML, and Tagwise APL. In practical RNA-seq data analysis, we recommend using one of these moderate-shrinkage methods with the QLShrink test in QuasiSeq R package. 相似文献

2.

Gene ontology analysis for RNA-seq: accounting for selection bias 总被引：13，自引：0，他引：13

Matthew D Young Matthew J Wakefield Gordon K Smyth Alicia Oshlack 《Genome biology》2010,11(2):R14

相似文献

3.

Differential expression analyses for single-cell RNA-Seq: old questions on new data

Zhun Miao Xuegong Zhang 《Quantitative Biology.》2016,4(4):243

Background: Single-cell RNA sequencing (scRNA-seq) is an emerging technology that enables high resolution detection of heterogeneities between cells. One important application of scRNA-seq data is to detect differential expression (DE) of genes. Currently, some researchers still use DE analysis methods developed for bulk RNA-Seq data on single-cell data, and some new methods for scRNA-seq data have also been developed. Bulk and single-cell RNA-seq data have different characteristics. A systematic evaluation of the two types of methods on scRNA-seq data is needed. Results: In this study, we conducted a series of experiments on scRNA-seq data to quantitatively evaluate 14 popular DE analysis methods, including both of traditional methods developed for bulk RNA-seq data and new methods specifically designed for scRNA-seq data. We obtained observations and recommendations for the methods under different situations. Conclusions: DE analysis methods should be chosen for scRNA-seq data with great caution with regard to different situations of data. Different strategies should be taken for data with different sample sizes and/or different strengths of the expected signals. Several methods for scRNA-seq data show advantages in some aspects, and DEGSeq tends to outperform other methods with respect to consistency, reproducibility and accuracy of predictions on scRNA-seq data. 相似文献

4.

Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package

Sonia Tarazona Pedro Furió-Tarí David Turrà Antonio Di Pietro María José Nueda Alberto Ferrer Ana Conesa 《Nucleic acids research》2015,43(21):e140

As the use of RNA-seq has popularized, there is an increasing consciousness of the importance of experimental design, bias removal, accurate quantification and control of false positives for proper data analysis. We introduce the NOISeq R-package for quality control and analysis of count data. We show how the available diagnostic tools can be used to monitor quality issues, make pre-processing decisions and improve analysis. We demonstrate that the non-parametric NOISeqBIO efficiently controls false discoveries in experiments with biological replication and outperforms state-of-the-art methods. NOISeq is a comprehensive resource that meets current needs for robust data-aware analysis of RNA-seq differential expression. 相似文献

5.

A semi-parametric statistical model for integrating gene expression profiles across different platforms

Lyu Yafei Li Qunhua 《BMC bioinformatics》2016,17(1):51-60

Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the two technologies. Integration data across these two platforms has the potential to improve the power and reliability of DEG detection. We propose a rank-based semi-parametric model to determine DEGs using information across different sources and apply it to the integration of RNA-seq and microarray data. By incorporating both the significance of differential expression and the consistency across platforms, our method effectively detects DEGs with moderate but consistent signals. We demonstrate the effectiveness of our method using simulation studies, MAQC/SEQC data and a synthetic microRNA dataset. Our integration method is not only robust to noise and heterogeneity in the data, but also adaptive to the structure of data. In our simulations and real data studies, our approach shows a higher discriminate power and identifies more biologically relevant DEGs than eBayes, DEseq and some commonly used meta-analysis methods. 相似文献

6.

A statistical framework for eQTL mapping using RNA-seq data

Sun W 《Biometrics》2012,68(1):1-11

RNA-seq may replace gene expression microarrays in the near future. Using RNA-seq, the expression of a gene can be estimated using the total number of sequence reads mapped to that gene, known as the total read count (TReC). Traditional expression quantitative trait locus (eQTL) mapping methods, such as linear regression, can be applied to TReC measurements after they are properly normalized. In this article, we show that eQTL mapping, by directly modeling TReC using discrete distributions, has higher statistical power than the two-step approach: data normalization followed by linear regression. In addition, RNA-seq provides information on allele-specific expression (ASE) that is not available from microarrays. By combining the information from TReC and ASE, we can computationally distinguish cis- and trans-eQTL and further improve the power of cis-eQTL mapping. Both simulation and real data studies confirm the improved power of our new methods. We also discuss the design issues of RNA-seq experiments. Specifically, we show that by combining TReC and ASE measurements, it is possible to minimize cost and retain the statistical power of cis-eQTL mapping by reducing sample size while increasing the number of sequence reads per sample. In addition to RNA-seq data, our method can also be employed to study the genetic basis of other types of sequencing data, such as chromatin immunoprecipitation followed by DNA sequencing data. In this article, we focus on eQTL mapping of a single gene using the association-based method. However, our method establishes a statistical framework for future developments of eQTL mapping methods using RNA-seq data (e.g., linkage-based eQTL mapping), and the joint study of multiple genetic markers and/or multiple genes. 相似文献

7.

Designing a transcriptome next-generation sequencing project for a nonmodel plant species1

Strickler SR Bombarely A Mueller LA 《American journal of botany》2012,99(2):257-266

相似文献

8.

Detecting cell-type-specific allelic expression imbalance by integrative analysis of bulk and single-cell RNA sequencing data

Jiaxin Fan Xuran Wang Rui Xiao Mingyao Li 《PLoS genetics》2021,17(3)

相似文献

9.

A Note on an Exon-Based Strategy to Identify Differentially Expressed Genes in RNA-Seq Experiments

Asta Laiho Laura L. Elo 《PloS one》2014,9(12)

相似文献

10.

SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

Jing Qi Yang Zhou Zicen Zhao Shuilin Jin 《PLoS computational biology》2021,17(6)

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis. 相似文献

11.

ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

Brett A. McKinney Bill C. White Diane E. Grill Peter W. Li Richard B. Kennedy Gregory A. Poland Ann L. Oberg 《PloS one》2013,8(12)

相似文献

12.

eQTL Mapping Using RNA-seq Data

Wei Sun Yijuan Hu 《Statistics in biosciences》2013,5(1):198-219

相似文献

13.

Computational analysis of bacterial RNA-Seq data

Ryan McClure Divya Balasubramanian Yan Sun Maksym Bobrovskyy Paul Sumby Caroline A. Genco Carin K. Vanderpool Brian Tjaden 《Nucleic acids research》2013,41(14):e140

相似文献

14.

Bias and Correction in RNA-seq Data for Marine Species

Kai Song Li Li Guofan Zhang 《Marine biotechnology (New York, N.Y.)》2017,19(5):541-550

相似文献

15.

MBASED: allele-specific expression detection in cancer tissues and cell lines

Oleg Mayba Houston N Gilbert Jinfeng Liu Peter M Haverty Suchit Jhunjhunwala Zhaoshi Jiang Colin Watanabe Zemin Zhang 《Genome biology》2014,15(8):1-21

Allele-specific gene expression, ASE, is an important aspect of gene regulation. We developed a novel method MBASED, meta-analysis based allele-specific expression detection for ASE detection using RNA-seq data that aggregates information across multiple single nucleotide variation loci to obtain a gene-level measure of ASE, even when prior phasing information is unavailable. MBASED is capable of one-sample and two-sample analyses and performs well in simulations. We applied MBASED to a panel of cancer cell lines and paired tumor-normal tissue samples, and observed extensive ASE in cancer, but not normal, samples, mainly driven by genomic copy number alterations. 相似文献

16.

Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability

Karolis Uziela Antti Honkela 《PloS one》2015,10(5)

Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package “prebs.” 相似文献

17.

A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae

Intawat Nookaew Marta Papini Natapol Pornputtapong Gionata Scalcinati Linn Fagerberg Matthias Uhl��n Jens Nielsen 《Nucleic acids research》2012,40(20):10084-10097

相似文献

18.

Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization

Zhilong Jia Xiang Zhang Naiyang Guan Xiaochen Bo Michael R. Barnes Zhigang Luo 《PloS one》2015,10(9)

相似文献

19.

RNASEQR--a streamlined and accurate RNA-seq sequence analysis program

Chen LY Wei KC Huang AC Wang K Huang CY Yi D Tang CY Galas DJ Hood LE 《Nucleic acids research》2012,40(6):e42

相似文献

20.

A Regression-Based Differential Expression Detection Algorithm for Microarray Studies with Ultra-Low Sample Size

Daniel Vasiliu Samuel Clamons Molly McDonough Brian Rabe Margaret Saha 《PloS one》2015,10(3)

Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality. 相似文献