首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.  相似文献   

2.
3.
4.
5.
6.
7.

Background

Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment?

Results

We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined.

Conclusions

Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0523-y) contains supplementary material, which is available to authorized users.  相似文献   

8.
9.

Background

RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.

Results

We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.

Conclusion

Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0467-2) contains supplementary material, which is available to authorized users.  相似文献   

10.
11.
12.

Background

Endometrial cancer is the most common gynecologic malignancy in developed countries and little is known about the underlying mechanism of stage and disease outcomes. The goal of this study was to identify differentially expressed genes (DEG) between late vs. early stage endometrioid adenocarcinoma (EAC) and uterine serous carcinoma (USC), as well as between disease outcomes in each of the two histological subtypes.

Methodology/Principal Finding

Gene expression profiles of 20 cancer samples were analyzed (EAC = 10, USC = 10) using the human genome wide illumina bead microarrays. There was little overlap in the DEG sets between late vs. early stages in EAC and USC, and there was an insignificant overlap in DEG sets between good and poor prognosis in EAC and USC. Remarkably, there was no overlap between the stage-derived DEGs and the prognosis-derived DEGs for each of the two histological subtypes. Further functional annotation of differentially expressed genes showed that the composition of enriched function terms were different among different DEG sets. Gene expression differences for selected genes of various stages and outcomes were confirmed by qRT-PCR with a high validation rate.

Conclusion

This data, although preliminary, suggests that there might be involvement of distinct groups of genes in tumor progression (late vs. early stage) in each of the EAC and USC. It also suggests that these genes are different from those involved in tumor outcome (good vs. poor prognosis). These involved genes, once clinically verified, may be important for predicting tumor progression and tumor outcome.  相似文献   

13.
14.
15.
16.

Background

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.

Results

We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.

Conclusions

Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-323) contains supplementary material, which is available to authorized users.  相似文献   

17.
18.
19.

Background

Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations, are structural rearrangements that can critically affect gene expression patterns. Additionally, copy number alteration profiles allow insight into cancer discrimination, progression and complexity. On data obtained from high-throughput sequencing, improving quality through GC bias correction and keeping false positives to a minimum help build reliable copy number alteration profiles.

Results

We introduce seqCNA, a parallelized R package for an integral copy number analysis of high-throughput sequencing cancer data. The package includes novel methodology on (i) filtering, reducing false positives, and (ii) GC content correction, improving copy number profile quality, especially under great read coverage and high correlation between GC content and copy number. Adequate analysis steps are automatically chosen based on availability of paired-end mapping, matched normal samples and genome annotation.

Conclusions

seqCNA, available through Bioconductor, provides accurate copy number predictions in tumoural data, thanks to the extensive filtering and better GC bias correction, while providing an integrated and parallelized workflow.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-178) contains supplementary material, which is available to authorized users.  相似文献   

20.

Background

Scanning force microscopy (SFM) allows direct, rapid and high-resolution visualization of single molecular complexes; irregular shapes and differences in sizes are immediately revealed by the scanning tip in three-dimensional images. However, high-throughput analysis of SFM data is limited by the lack of versatile software tools accessible to SFM users. Most existing SFM software tools are aimed at broad general use: from material-surface analysis to visualization of biomolecules.

Results

We present SFMetrics as a metrology toolbox for SFM, specifically aimed at biomolecules like DNA and proteins, which features (a) semi-automatic high-throughput analysis of individual molecules; (b) ease of use working within MATLAB environment or as a stand-alone application; (c) compatibility with MultiMode (Bruker), NanoWizard (JPK instruments), Asylum (Asylum research), ASCII, and TIFF files, that can be adjusted with minor modifications to other formats.

Conclusion

Assembled in a single user interface, SFMetrics serves as a semi-automatic analysis tool capable of measuring several geometrical properties (length, volume and angles) from DNA and protein complexes, but is also applicable to other samples with irregular shapes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0457-8) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号