共查询到20条相似文献,搜索用时 390 毫秒
1.
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users. 相似文献2.
3.
4.
5.
6.
7.
Zhenqiang Su Hong Fang Huixiao Hong Leming Shi Wenqian Zhang Wenwei Zhang Yanyan Zhang Zirui Dong Lee J Lancashire Marina Bessarabova Xi Yang Baitang Ning Binsheng Gong Joe Meehan Joshua Xu Weigong Ge Roger Perkins Matthias Fischer Weida Tong 《Genome biology》2014,15(12)
Background
Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment?Results
We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined.Conclusions
Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0523-y) contains supplementary material, which is available to authorized users. 相似文献8.
9.
Nikolaos I Panousis Maria Gutierrez-Arcelus Emmanouil T Dermitzakis Tuuli Lappalainen 《Genome biology》2014,15(9)
Background
RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.Results
We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.Conclusion
Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0467-2) contains supplementary material, which is available to authorized users. 相似文献10.
11.
12.
Mhawech-Fauceglia P Wang D Kesterson J Clark K Monhollen L Odunsi K Lele S Liu S 《PloS one》2010,5(11):e15415
Background
Endometrial cancer is the most common gynecologic malignancy in developed countries and little is known about the underlying mechanism of stage and disease outcomes. The goal of this study was to identify differentially expressed genes (DEG) between late vs. early stage endometrioid adenocarcinoma (EAC) and uterine serous carcinoma (USC), as well as between disease outcomes in each of the two histological subtypes.Methodology/Principal Finding
Gene expression profiles of 20 cancer samples were analyzed (EAC = 10, USC = 10) using the human genome wide illumina bead microarrays. There was little overlap in the DEG sets between late vs. early stages in EAC and USC, and there was an insignificant overlap in DEG sets between good and poor prognosis in EAC and USC. Remarkably, there was no overlap between the stage-derived DEGs and the prognosis-derived DEGs for each of the two histological subtypes. Further functional annotation of differentially expressed genes showed that the composition of enriched function terms were different among different DEG sets. Gene expression differences for selected genes of various stages and outcomes were confirmed by qRT-PCR with a high validation rate.Conclusion
This data, although preliminary, suggests that there might be involvement of distinct groups of genes in tumor progression (late vs. early stage) in each of the EAC and USC. It also suggests that these genes are different from those involved in tumor outcome (good vs. poor prognosis). These involved genes, once clinically verified, may be important for predicting tumor progression and tumor outcome. 相似文献13.
14.
15.
16.
Background
Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.Results
We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.Conclusions
Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-323) contains supplementary material, which is available to authorized users. 相似文献17.
18.
19.
David Mosen-Ansorena Naiara Telleria Silvia Veganzones Virginia De la Orden Maria Luisa Maestro Ana M Aransay 《BMC genomics》2014,15(1)
Background
Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations, are structural rearrangements that can critically affect gene expression patterns. Additionally, copy number alteration profiles allow insight into cancer discrimination, progression and complexity. On data obtained from high-throughput sequencing, improving quality through GC bias correction and keeping false positives to a minimum help build reliable copy number alteration profiles.Results
We introduce seqCNA, a parallelized R package for an integral copy number analysis of high-throughput sequencing cancer data. The package includes novel methodology on (i) filtering, reducing false positives, and (ii) GC content correction, improving copy number profile quality, especially under great read coverage and high correlation between GC content and copy number. Adequate analysis steps are automatically chosen based on availability of paired-end mapping, matched normal samples and genome annotation.Conclusions
seqCNA, available through Bioconductor, provides accurate copy number predictions in tumoural data, thanks to the extensive filtering and better GC bias correction, while providing an integrated and parallelized workflow.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-178) contains supplementary material, which is available to authorized users. 相似文献20.