期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A beginners guide to SNP calling from high-throughput DNA-sequencing data

A Altmann P Weber D Bader M Preuß EB Binder B Müller-Myhsok 《Human genetics》2012,131(10):1541-1554

High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results. 相似文献

2.

GxGrare: gene-gene interaction analysis method for rare variants from high-throughput sequencing data

Minseok Kwon Sangseob Leem Joon Yoon Taesung Park 《BMC systems biology》2018,12(2):19

Background

With the rapid advancement of array-based genotyping techniques, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with common complex diseases. However, it has been shown that only a small proportion of the genetic etiology of complex diseases could be explained by the genetic factors identified from GWAS. This missing heritability could possibly be explained by gene-gene interaction (epistasis) and rare variants. There has been an exponential growth of gene-gene interaction analysis for common variants in terms of methodological developments and practical applications. Also, the recent advancement of high-throughput sequencing technologies makes it possible to conduct rare variant analysis. However, little progress has been made in gene-gene interaction analysis for rare variants.

Results

Here, we propose GxGrare which is a new gene-gene interaction method for the rare variants in the framework of the multifactor dimensionality reduction (MDR) analysis. The proposed method consists of three steps; 1) collapsing the rare variants, 2) MDR analysis for the collapsed rare variants, and 3) detect top candidate interaction pairs. GxGrare can be used for the detection of not only gene-gene interactions, but also interactions within a single gene. The proposed method is illustrated with 1080 whole exome sequencing data of the Korean population in order to identify causal gene-gene interaction for rare variants for type 2 diabetes.

Conclusion

The proposed GxGrare performs well for gene-gene interaction detection with collapsing of rare variants. GxGrare is available at http://bibs.snu.ac.kr/software/gxgrare which contains simulation data and documentation. Supported operating systems include Linux and OS X.

相似文献

3.

MeQA: a pipeline for MeDIP-seq data quality assessment and analysis

Huang J Renault V Sengenès J Touleimat N Michel S Lathrop M Tost J 《Bioinformatics (Oxford, England)》2012,28(4):587-588

相似文献

4.

TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data

Asmann YW Middha S Hossain A Baheti S Li Y Chai HS Sun Z Duffy PH Hadad AA Nair A Liu X Zhang Y Klee EW Kalari KR Kocher JP 《Bioinformatics (Oxford, England)》2012,28(2):277-278

TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways. AVAILABILITY AND IMPLEMENTATION: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm. 相似文献

5.

Cloud-scale RNA-sequencing differential expression analysis with Myrna

Ben Langmead Kasper D Hansen Jeffrey T Leek 《Genome biology》2010,11(8):1-11

相似文献

6.

Integrating Multiple Genomic Data to Predict Disease-Causing Nonsynonymous Single Nucleotide Variants in Exome Sequencing Studies

Jiaxin Wu Yanda Li Rui Jiang 《PLoS genetics》2014,10(3)

Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring. 相似文献

7.

A workflow for mutation extraction and structure annotation

Kanagasabai R Choo KH Ranganathan S Baker CJ 《Journal of bioinformatics and computational biology》2007,5(6):1319-1337

Rich information on point mutation studies is scattered across heterogeneous data sources. This paper presents an automated workflow for mining mutation annotations from full-text biomedical literature using natural language processing (NLP) techniques as well as for their subsequent reuse in protein structure annotation and visualization. This system, called mSTRAP (Mutation extraction and STRucture Annotation Pipeline), is designed for both information aggregation and subsequent brokerage of the mutation annotations. It facilitates the coordination of semantically related information from a series of text mining and sequence analysis steps into a formal OWL-DL ontology. The ontology is designed to support application-specific data management of sequence, structure, and literature annotations that are populated as instances of object and data type properties. mSTRAPviz is a subsystem that facilitates the brokerage of structure information and the associated mutations for visualization. For mutated sequences without any corresponding structure available in the Protein Data Bank (PDB), an automated pipeline for homology modeling is developed to generate the theoretical model. With mSTRAP, we demonstrate a workable system that can facilitate automation of the workflow for the retrieval, extraction, processing, and visualization of mutation annotations -- tasks which are well known to be tedious, time-consuming, complex, and error-prone. The ontology and visualization tool are available at (http://datam.i2r.a-star.edu.sg/mstrap). 相似文献

8.

A graph-based approach for designing extensible pipelines

MR Rodrigues WC Magalhaes M Machado E Tarazona-Santos 《BMC bioinformatics》2012,13(1):163

相似文献

9.

MuPIT interactive: webserver for mapping variant positions to annotated, interactive 3D structures

Noushin Niknafs Dewey Kim RyangGuk Kim Mark Diekhans Michael Ryan Peter D. Stenson David N. Cooper Rachel Karchin 《Human genetics》2013,132(11):1235-1243

Mutation position imaging toolbox (MuPIT) interactive is a browser-based application for single-nucleotide variants (SNVs), which automatically maps the genomic coordinates of SNVs onto the coordinates of available three-dimensional (3D) protein structures. The application is designed for interactive browser-based visualization of the putative functional relevance of SNVs by biologists who are not necessarily experts either in bioinformatics or protein structure. Users may submit batches of several thousand SNVs and review all protein structures that cover the SNVs, including available functional annotations such as binding sites, mutagenesis experiments, and common polymorphisms. Multiple SNVs may be mapped onto each structure, enabling 3D visualization of SNV clusters and their relationship to functionally annotated positions. We illustrate the utility of MuPIT interactive in rationalizing the impact of selected polymorphisms in the PharmGKB database, somatic mutations identified in the Cancer Genome Atlas study of invasive breast carcinomas, and rare variants identified in the exome sequencing project. MuPIT interactive is freely available for non-profit use at http://mupit.icm.jhu.edu. 相似文献

10.

arrayMagic: two-colour cDNA microarray quality control and preprocessing

Buness A Huber W Steiner K Sültmann H Poustka A 《Bioinformatics (Oxford, England)》2005,21(4):554-556

相似文献

11.

SNPAAMapper: An efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data

Yongsheng Bai James Cavalcoli 《Bioinformation》2013,9(17):870-872

相似文献

12.

GeneTrack--a genomic data processing and visualization framework

Albert I Wachi S Jiang C Pugh BF 《Bioinformatics (Oxford, England)》2008,24(10):1305-1306

MOTIVATION: High-throughput 'ChIP-chip' and 'ChIP-seq' methodologies generate sufficiently large data sets that analysis poses significant informatics challenges, particularly for research groups with modest computational support. To address this challenge, we devised a software platform for storing, analyzing and visualizing high resolution genome-wide binding data. GeneTrack automates several steps of a typical data processing pipeline, including smoothing and peak detection, and facilitates dissemination of the results via the web. Our software is freely available via the Google Project Hosting environment at http://genetrack.googlecode.com 相似文献

13.

Windows下16S rRNA基因扩增子测序数据分析的简易流程

下载免费PDF全文

方梅梅王禹煊王明月郑和龙张璐向沙沙张国庆李余动《生物信息学》2018,16(4):239-245

微生物组数据分析需要掌握Linux系统操作,这对缺乏计算机知识的生物研究人员是一个很大的障碍。为此我们设计了一套在Windows的Linux子系统(WSL)下分析16S rRNA基因扩增子高通量测序数据的简易流程。本流程整合常用的开源软件VSEARCH与QIIME等,能对16S rRNA测序数据进行质量控制、OTU聚类、多样性分析及结果可视化呈现。以唾液微生物组分析为例,详细介绍从原始数据到多样性统计分析过程的参数和命令,及结果解读。教学实践证明,此流程易于学习,并有助于掌握微生物组的基本概念与方法。利用Windows系统最新的WSL功能,本流程方便Windows用户使用大量在Linux上运行的生物信息工具,有助于促进微生物组研究的发展。流程的安装程序与测序数据可从网址(http://www. ligene. cn/win16s/)免费下载使用。相似文献

14.

ChromaPipe: a pipeline for analysis, quality control and management for a DNA sequencing facility

Otto TD Vasconcellos EA Gomes LH Moreira AS Degrave WM Mendonça-Lima L Alves-Ferreira M 《Genetics and molecular research : GMR》2008,7(3):861-871

Optimizing and monitoring the data flow in high-throughput sequencing facilities is important for data input and output, for tracking the status of results for the users of the facility, and to guarantee a good, high-quality service. In a multi-user system environment with different throughputs, each user wants to access his/her data easily, track his/her sequencing history, analyze sequences and their quality, and apply some basic post-sequencing analysis, without the necessity of installing further software. Recently, Fiocruz established such a core facility as a "technological platform". Infrastructure includes a 48-capillary 3730 DNA Sequence Analyzer (Applied Biosystems) and supporting equipment. The service includes running samples for large-scale users, performing DNA sequencing reactions and runs for medium and small users, and participation in partial or full genome projects. We implemented a workflow that fulfills these requirements for small and high throughput users. Our implementation also includes the monitoring of data for continuous quality improvement (reports by plate, month and user) by the sequencing staff. For the user, different analyses of the chromatograms, such as visualization of good quality regions, as well as processing, such as comparisons or assemblies, are available. So far, 180 users have made use of the service, generating 155,000 sequences, 35% of which were produced for the BCG Moreau-RJ genome project. The pipeline (named ChromaPipe for Chromatogram Pipeline) is available for download by the scientific community at the url http://bioinfo.pdtis.fiocruz.br/ChromaPipe/. The support for assembly is also configured as a web service: http://bioinfo.pdtis.fiocruz.br/Assembly/. 相似文献

15.

Clinical analysis of germline copy number variation in DMD using a non-conjugate hierarchical Bayesian model

Velina Kozareva Clayton Stroff Maxwell Silver Jonathan F. Freidin Nigel F. Delaney 《BMC medical genomics》2018,11(1):91

Background

Detection of copy number variants (CNVs) is an important aspect of clinical testing for several disorders, including Duchenne muscular dystrophy, and is often performed using multiplex ligation-dependent probe amplification (MLPA). However, since many genetic carrier screens depend instead on next-generation sequencing (NGS) for wider discovery of small variants, they often do not include CNV analysis. Moreover, most computational techniques developed to detect CNVs from exome sequencing data are not suitable for carrier screening, as they require matched normals, very large cohorts, or extensive gene panels.

Methods

We present a computational software package, geneCNV (http://github.com/vkozareva/geneCNV), which can identify exon-level CNVs using exome sequencing data from only a few genes. The tool relies on a hierarchical parametric model trained on a small cohort of reference samples.

Results

Using geneCNV, we accurately inferred heterozygous CNVs in the DMD gene across a cohort of 15 test subjects. These results were validated against MLPA, the current standard for clinical CNV analysis in DMD. We also benchmarked the tool’s performance against other computational techniques and found comparable or improved CNV detection in DMD using data from panels ranging from 4,000 genes to as few as 8 genes.

Conclusions

geneCNV allows for the creation of cost-effective screening panels by allowing NGS sequencing approaches to generate results equivalent to bespoke genotyping assays like MLPA. By using a parametric model to detect CNVs, it also fulfills regulatory requirements to define a reference range for a genetic test. It is freely available and can be incorporated into any Illumina sequencing pipeline to create clinical assays for detection of exon duplications and deletions.

相似文献

16.

CoBRA: Containerized Bioinformatics Workflow for Reproducible ChIP/ATAC-seq Analysis

Xintao Qiu Avery S. Feit Ariel Feiglin Yingtian Xie Nikolas Kesten Len Taing Joseph Perkins Shengqing Gu Yihao Li Paloma Cejas Ningxuan Zhou Rinath Jeselsohn Myles Brown X. Shirley Liu Henry W. Long 《基因组蛋白质组与生物信息学报(英文版)》2021,19(4):652-661

Chromatin immunoprecipitation sequencing (ChIP-seq) and the Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq) have become essential technologies to effectively measure protein–DNA interactions and chromatin accessibility. However, there is a need for a scalable and reproducible pipeline that incorporates proper normalization between samples, correction of copy number variations, and integration of new downstream analysis tools. Here we present Containerized Bioinformatics workflow for Reproducible ChIP/ATAC-seq Analysis (CoBRA), a modularized computational workflow which quantifies ChIP-seq and ATAC-seq peak regions and performs unsupervised and supervised analyses. CoBRA provides a comprehensive state-of-the-art ChIP-seq and ATAC-seq analysis pipeline that can be used by scientists with limited computational experience. This enables researchers to gain rapid insight into protein–DNA interactions and chromatin accessibility through sample clustering, differential peak calling, motif enrichment, comparison of sites to a reference database, and pathway analysis. CoBRA is publicly available online at https://bitbucket.org/cfce/cobra 相似文献

17.

Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning

Renjie Tan Yufeng Shen 《Nucleic acids research》2022,50(21):e123

Exome sequencing is widely used in genetic studies of human diseases and clinical genetic diagnosis. Accurate detection of copy number variants (CNVs) is important to fully utilize exome sequencing data. However, exome data are noisy. None of the existing methods alone can achieve both high precision and recall rate. A common practice is to perform heuristic filtration followed by manual inspection of read depth of putative CNVs. This approach does not scale in large studies. To address this issue, we developed a transfer learning method, CNV-espresso, for in silico confirming rare CNVs from exome sequencing data. CNV-espresso encodes candidate CNVs from exome data as images and uses pretrained convolutional neural network models to classify copy number states. We trained CNV-espresso using an offspring–parents trio exome sequencing dataset, with inherited CNVs as positives and CNVs with Mendelian errors as negatives. We evaluated the performance using additional samples that have both exome and whole-genome sequencing (WGS) data. Assuming the CNVs detected from WGS data as a proxy of ground truth, CNV-espresso significantly improves precision while keeping recall almost intact, especially for CNVs that span a small number of exons. CNV-espresso can effectively replace manual inspection of CNVs in large-scale exome sequencing studies. 相似文献

18.

Gel-free multiplexed reduced representation bisulfite sequencing for large-scale DNA methylation profiling

Patrick Boyle Kendell Clement Hongcang Gu Zachary D Smith Michael Ziller Jennifer L Fostel Laurie Holmes Jim Meldrim Fontina Kelley Andreas Gnirke Alexander Meissner 《Genome biology》2012,13(10):1-10

DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth. 相似文献

19.

SESAME (SEquence Sorter & AMplicon Explorer): genotyping based on high-throughput multiplex amplicon sequencing

Meglécz E Piry S Desmarais E Galan M Gilles A Guivier E Pech N Martin JF 《Bioinformatics (Oxford, England)》2011,27(2):277-278

SUMMARY: Characterizing genetic diversity through genotyping short amplicons is central to evolutionary biology. Next-generation sequencing (NGS) technologies changed the scale at which these type of data are acquired. SESAME is a web application package that assists genotyping of multiplexed individuals for several markers based on NGS amplicon sequencing. It automatically assigns reads to loci and individuals, corrects reads if standard samples are available and provides an intuitive graphical user interface (GUI) for allele validation based on the sequences and associated decision-making tools. The aim of SESAME is to help allele identification among a large number of sequences. AVAILABILITY: SESAME and its documentation are freely available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Licence for Windows and Linux from http://www1.montpellier.inra.fr/CBGP/NGS/ or http://tinyurl.com/ngs-sesame. 相似文献

20.

The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data

Xiaojia Tang Saurabh Baheti Khader Shameer Kevin J. Thompson Quin Wills Nifang Niu Ilona N. Holcomb Stephane C. Boutet Ramesh Ramakrishnan Jennifer M. Kachergus Jean-Pierre A. Kocher Richard M. Weinshilboum Liewei Wang E.?Aubrey Thompson Krishna R. Kalari 《Nucleic acids research》2014,42(22):e172

Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/. 相似文献