期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results. 相似文献

6.

Genome-wide identification and functional analysis of Apobec-1-mediated C-to-U RNA editing in mouse small intestine and liver

Valerie Blanc Eddie Park Sabine Schaefer Melanie Miller Yiing Lin Susan Kennedy Anja M Billing Hisham Ben Hamidane Johannes Graumann Ali Mortazavi Joseph H Nadeau Nicholas O Davidson 《Genome biology》2014,15(6):R79

相似文献

7.

Comparison of RNA-Seq by poly (A) capture,ribosomal RNA depletion,and DNA microarray for expression profiling

Wei Zhao Xiaping He Katherine A Hoadley Joel S Parker David Neil Hayes Charles M Perou 《BMC genomics》2014,15(1)

相似文献

8.

Evaluating the Impact of Sequencing Depth on Transcriptome Profiling in Human Adipose

Yichuan Liu Jane F. Ferguson Chenyi Xue Ian M. Silverman Brian Gregory Muredach P. Reilly Mingyao Li 《PloS one》2013,8(6)

相似文献

9.

UnSplicer: mapping spliced RNA-seq reads in compact genomes and filtering noisy splicing

Paul D. Burns Yang Li Jian Ma Mark Borodovsky 《Nucleic acids research》2014,42(4):e25

Accurate mapping of spliced RNA-Seq reads to genomic DNA has been known as a challenging problem. Despite significant efforts invested in developing efficient algorithms, with the human genome as a primary focus, the best solution is still not known. A recently introduced tool, TrueSight, has demonstrated better performance compared with earlier developed algorithms such as TopHat and MapSplice. To improve detection of splice junctions, TrueSight uses information on statistical patterns of nucleotide ordering in intronic and exonic DNA. This line of research led to yet another new algorithm, UnSplicer, designed for eukaryotic species with compact genomes where functional alternative splicing is likely to be dominated by splicing noise. Genome-specific parameters of the new algorithm are generated by GeneMark-ES, an ab initio gene prediction algorithm based on unsupervised training. UnSplicer shares several components with TrueSight; the difference lies in the training strategy and the classification algorithm. We tested UnSplicer on RNA-Seq data sets of Arabidopsis thaliana, Caenorhabditis elegans, Cryptococcus neoformans and Drosophila melanogaster. We have shown that splice junctions inferred by UnSplicer are in better agreement with knowledge accumulated on these well-studied genomes than predictions made by earlier developed tools. 相似文献

10.

MiST: A new approach to variant detection in deep sequencing datasets

Sailakshmi Subramanian Valentina Di Pierro Hardik Shah Anitha D. Jayaprakash Ian Weisberger Jaehee Shim Ajish George Bruce D. Gelb Ravi Sachidanandam 《Nucleic acids research》2013,41(16):e154

MiST is a novel approach to variant calling from deep sequencing data, using the inverted mapping approach developed for Geoseq. Reads that can map to a targeted exonic region are identified using exact matches to tiles from the region. The reads are then aligned to the targets to discover variants. MiST carefully handles paralogous reads that map ambiguously to the genome and clonal reads arising from PCR bias, which are the two major sources of errors in variant calling. The reduced computational complexity of mapping selected reads to targeted regions of the genome improves speed, specificity and sensitivity of variant detection. Compared with variant calls from the GATK platform, MiST showed better concordance with SNPs from dbSNP and genotypes determined by an exonic-SNP array. Variant calls made only by MiST confirm at a high rate (>90%) by Sanger sequencing. Thus, MiST is a valuable alternative tool to analyse variants in deep sequencing data. 相似文献

11.

Mapping and quantifying mammalian transcriptomes by RNA-Seq 总被引：43，自引：0，他引：43

Mortazavi A Williams BA McCue K Schaeffer L Wold B 《Nature methods》2008,5(7):621-628

相似文献

12.

SNP-based large-scale identification of allele-specific gene expression in human B cells

Song MY Kim HE Kim S Choi IH Lee JK 《Gene》2012,493(2):211-218

Polymorphism and variations in gene expression provide the genetic basis for human variation. Allelic variation of gene expression, in particular, may play a crucial role in phenotypic variation and disease susceptibility. To identify genes with allelic expression in human cells, we genotyped genomic DNA and cDNA isolated from 31 immortalized B cell lines from three Centre d'Etude du Polymorphisme Humain (CEPH) families using high-density single-nucleotide polymorphism (SNP) chips containing 13,900 exonic SNPs. We identified seven SNPs in five genes with monoallelic expression, 146 SNPs in 125 genes with allelic imbalance in expression with preferentially higher expression of one allele in a heterozygous individual. The monoallelically expressed genes (ERAP2, MDGA1, LOC644422, SDCCAG3P1 and CLTCL1) were regulated by cis-acting, non-imprinted differential allelic control. In addition, all monoallelic gene expression patterns and allelic imbalances in gene expression in B cells were transmitted from parents to offspring in the pedigree, indicating genetic transmission of allelic gene expression. Furthermore, frequent allele substitution, probably due to RNA editing, was also observed in 21 genes in 23 SNPs as well as in 48 SNPs located in regions containing no known genes. In this study, we demonstrated that allelic gene expression is frequently observed in human B cells, and SNP chips are very useful tools for detecting allelic gene expression. Overall, our data provide a valuable framework for better understanding allelic gene expression in human B cells. 相似文献

13.

Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing

Ernesto Picardi David S. Horner Matteo Chiara Riccardo Schiavon Giorgio Valle Graziano Pesole 《Nucleic acids research》2010,38(14):4755-4767

相似文献

14.

RNA-Seq Uncovers SNPs and Alternative Splicing Events in Asian Lotus (Nelumbo nucifera)

Mei Yang Liming Xu Yanling Liu Pingfang Yang 《PloS one》2015,10(4)

相似文献

15.

Modelling and simulating generic RNA-Seq experiments with the flux simulator

Thasso Griebel Benedikt Zacher Paolo Ribeca Emanuele Raineri Vincent Lacroix Roderic Guig�� Michael Sammeth 《Nucleic acids research》2012,40(20):10073-10083

相似文献

16.

Analyzing allele specific RNA expression using mixture models

Rong Lu Ryan M Smith Michal Seweryn Danxin Wang Katherine Hartmann Amy Webb Wolfgang Sadee Grzegorz A Rempala 《BMC genomics》2015,16(1)

相似文献

17.

EditPredict: Prediction of RNA editable sites with convolutional neural network

《Genomics》2021,113(6):3864-3871

RNA editing exerts critical impacts on numerous biological processes. While millions of RNA editings have been identified in humans, much more are expected to be discovered. In this work, we constructed Convolutional Neural Network (CNN) models to predict human RNA editing events in both Alu regions and non-Alu regions. With a validation dataset resulting from CRISPR/Cas9 knockout of the ADAR1 enzyme, the validation accuracies reached 99.5% and 93.6% for Alu and non-Alu regions, respectively. We ported our CNN models in a web service named EditPredict. EditPredict not only works on reference genome sequences but can also take into consideration single nucleotide variants in personal genomes. In addition to the human genome, EditPredict tackles other model organisms including bumblebee, fruitfly, mouse, and squid genomes. EditPredict can be used stand-alone to predict novel RNA editing and it can be used to assist in filtering for candidate RNA editing detected from RNA-Seq data. 相似文献

18.

Assessment of the Impact of Using a Reference Transcriptome in Mapping Short RNA-Seq Reads

Shanrong Zhao 《PloS one》2014,9(7)

相似文献

19.

A new strategy to reduce allelic bias in RNA-Seq readmapping

Vijaya Satya R Zavaljevski N Reifman J 《Nucleic acids research》2012,40(16):e127

Accurate estimation of expression levels from RNA-Seq data entails precise mapping of the sequence reads to a reference genome. Because the standard reference genome contains only one allele at any given locus, reads overlapping polymorphic loci that carry a non-reference allele are at least one mismatch away from the reference and, hence, are less likely to be mapped. This bias in read mapping leads to inaccurate estimates of allele-specific expression (ASE). To address this read-mapping bias, we propose the construction of an enhanced reference genome that includes the alternative alleles at known polymorphic loci. We show that mapping to this enhanced reference reduced the read-mapping biases, leading to more reliable estimates of ASE. Experiments on simulated data show that the proposed strategy reduced the number of loci with mapping bias by ≥63% when compared with a previous approach that relies on masking the polymorphic loci and by ≥18% when compared with the standard approach that uses an unaltered reference. When we applied our strategy to actual RNA-Seq data, we found that it mapped up to 15% more reads than the previous approaches and identified many seemingly incorrect inferences made by them. 相似文献

20.

Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies

Nikolaos I Panousis Maria Gutierrez-Arcelus Emmanouil T Dermitzakis Tuuli Lappalainen 《Genome biology》2014,15(9)

Background

RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.

Results

We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.

Conclusion

Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0467-2) contains supplementary material, which is available to authorized users. 相似文献