期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Integrating Multiple Genomic Data to Predict Disease-Causing Nonsynonymous Single Nucleotide Variants in Exome Sequencing Studies

Jiaxin Wu Yanda Li Rui Jiang 《PLoS genetics》2014,10(3)

Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring. 相似文献

2.

SAPTA: a new design tool for improving TALE nuclease activity

Yanni Lin Eli J. Fine Zhilan Zheng Christopher J. Antico Richard A. Voit Matthew H. Porteus Thomas J. Cradick Gang Bao 《Nucleic acids research》2014,42(6):e47

相似文献

3.

Decoding the complex genetic causes of heart diseases using systems biology

Djordje Djordjevic Vinita Deshpande Tomasz Szczesnik Andrian Yang David T. Humphreys Eleni Giannoulatou Joshua W. K. Ho 《Biophysical reviews》2015,7(1):141-159

相似文献

4.

SNPAAMapper: An efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data

Yongsheng Bai James Cavalcoli 《Bioinformation》2013,9(17):870-872

相似文献

5.

Secondary structure and domain architecture of the 23S and 5S rRNAs

Anton S. Petrov Chad R. Bernier Eli Hershkovits Yuzhen Xue Chris C. Waterbury Chiaolong Hsiao Victor G. Stepanov Eric A. Gaucher Martha A. Grover Stephen C. Harvey Nicholas V. Hud Roger M. Wartell George E. Fox Loren Dean Williams 《Nucleic acids research》2013,41(15):7522-7535

We present a de novo re-determination of the secondary (2°) structure and domain architecture of the 23S and 5S rRNAs, using 3D structures, determined by X-ray diffraction, as input. In the traditional 2° structure, the center of the 23S rRNA is an extended single strand, which in 3D is seen to be compact and double helical. Accurately assigning nucleotides to helices compels a revision of the 23S rRNA 2° structure. Unlike the traditional 2° structure, the revised 2° structure of the 23S rRNA shows architectural similarity with the 16S rRNA. The revised 2° structure also reveals a clear relationship with the 3D structure and is generalizable to rRNAs of other species from all three domains of life. The 2° structure revision required us to reconsider the domain architecture. We partitioned the 23S rRNA into domains through analysis of molecular interactions, calculations of 2D folding propensities and compactness. The best domain model for the 23S rRNA contains seven domains, not six as previously ascribed. Domain 0 forms the core of the 23S rRNA, to which the other six domains are rooted. Editable 2° structures mapped with various data are provided (http://apollo.chemistry.gatech.edu/RibosomeGallery). 相似文献

6.

SavvyCNV: Genome-wide CNV calling from off-target reads

Thomas W. Laver Elisa De Franco Matthew B. Johnson Kashyap A. Patel Sian Ellard Michael N. Weedon Sarah E. Flanagan Matthew N. Wakeling 《PLoS computational biology》2022,18(3)

Identifying copy number variants (CNVs) can provide diagnoses to patients and provide important biological insights into human health and disease. Current exome and targeted sequencing approaches cannot detect clinically and biologically-relevant CNVs outside their target area. We present SavvyCNV, a tool which uses off-target read data from exome and targeted sequencing data to call germline CNVs genome-wide. Up to 70% of sequencing reads from exome and targeted sequencing fall outside the targeted regions. We have developed a new tool, SavvyCNV, to exploit this ‘free data’ to call CNVs across the genome. We benchmarked SavvyCNV against five state-of-the-art CNV callers using truth sets generated from genome sequencing data and Multiplex Ligation-dependent Probe Amplification assays. SavvyCNV called CNVs with high precision and recall, outperforming the five other tools at calling CNVs genome-wide, using off-target or on-target reads from targeted panel and exome sequencing. We then applied SavvyCNV to clinical samples sequenced using a targeted panel and were able to call previously undetected clinically-relevant CNVs, highlighting the utility of this tool within the diagnostic setting. SavvyCNV outperforms existing tools for calling CNVs from off-target reads. It can call CNVs genome-wide from targeted panel and exome data, increasing the utility and diagnostic yield of these tests. SavvyCNV is freely available at https://github.com/rdemolgen/SavvySuite. 相似文献

7.

The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data

Xiaojia Tang Saurabh Baheti Khader Shameer Kevin J. Thompson Quin Wills Nifang Niu Ilona N. Holcomb Stephane C. Boutet Ramesh Ramakrishnan Jennifer M. Kachergus Jean-Pierre A. Kocher Richard M. Weinshilboum Liewei Wang E.?Aubrey Thompson Krishna R. Kalari 《Nucleic acids research》2014,42(22):e172

Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/. 相似文献

8.

DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels

Huiying Zhao Yuedong Yang Hai Lin Xinjun Zhang Matthew Mort David N Cooper Yunlong Liu Yaoqi Zhou 《Genome biology》2013,14(3):R23

Micro-indels (insertions or deletions shorter than 21 bps) constitute the second most frequent class of human gene mutation after single nucleotide variants. Despite the relative abundance of non-frameshifting indels, their damaging effect on protein structure and function has gone largely unstudied. We have developed a support vector machine-based method named DDIG-in (Detecting disease-causing genetic variations due to indels) to prioritize non-frameshifting indels by comparing disease-associated mutations with putatively neutral mutations from the 1,000 Genomes Project. The final model gives good discrimination for indels and is robust against annotation errors. A webserver implementing DDIG-in is available at http://sparks-lab.org/ddig. 相似文献

9.

Inferring copy number and genotype in tumour exome data

Kaushalya C Amarasinghe Jason Li Sally M Hunter Georgina L Ryland Prue A Cowin Ian G Campbell Saman K Halgamuge 《BMC genomics》2014,15(1)

Background

Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation.

Results

We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure.

Conclusions

Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-732) contains supplementary material, which is available to authorized users. 相似文献

10.

SIDD: A Semantically Integrated Database towards a Global View of Human Disease

Liang Cheng Guohua Wang Jie Li Tianjiao Zhang Peigang Xu Yadong Wang 《PloS one》2013,8(10)

Background

A number of databases have been developed to collect disease-related molecular, phenotypic and environmental features (DR-MPEs), such as genes, non-coding RNAs, genetic variations, drugs, phenotypes and environmental factors. However, each of current databases focused on only one or two DR-MPEs. There is an urgent demand to develop an integrated database, which can establish semantic associations among disease-related databases and link them to provide a global view of human disease at the biological level. This database, once developed, will facilitate researchers to query various DR-MPEs through disease, and investigate disease mechanisms from different types of data.

Methodology

To establish an integrated disease-associated database, disease vocabularies used in different databases are mapped to Disease Ontology (DO) through semantic match. 4,284 and 4,186 disease terms from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) respectively are mapped to DO. Then, the relationships between DR-MPEs and diseases are extracted and merged from different source databases for reducing the data redundancy.

Conclusions

A semantically integrated disease-associated database (SIDD) is developed, which integrates 18 disease-associated databases, for researchers to browse multiple types of DR-MPEs in a view. A web interface allows easy navigation for querying information through browsing a disease ontology tree or searching a disease term. Furthermore, a network visualization tool using Cytoscape Web plugin has been implemented in SIDD. It enhances the SIDD usage when viewing the relationships between diseases and DR-MPEs. The current version of SIDD (Jul 2013) documents 4,465,131 entries relating to 139,365 DR-MPEs, and to 3,824 human diseases. The database can be freely accessed from: http://mlg.hit.edu.cn/SIDD. 相似文献

11.

MendeLIMS: a web-based laboratory information management system for clinical genome sequencing

Susan M Grimes Hanlee P Ji 《BMC bioinformatics》2014,15(1)

Background

Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis.

Results

To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies.

Conclusions

We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-290) contains supplementary material, which is available to authorized users. 相似文献

12.

Detection of internal exon deletion with exon Del

Yan Guo Shilin Zhao Brian D Lehmann Quanhu Sheng Timothy M Shaver Thomas P Stricker Jennifer A Pietenpol Yu Shyr 《BMC bioinformatics》2014,15(1)

Background

Exome sequencing allows researchers to study the human genome in unprecedented detail. Among the many types of variants detectable through exome sequencing, one of the most over looked types of mutation is internal deletion of exons. Internal exon deletions are the absence of consecutive exons in a gene. Such deletions have potentially significant biological meaning, and they are often too short to be considered copy number variation. Therefore, to the need for efficient detection of such deletions using exome sequencing data exists.

Results

We present ExonDel, a tool specially designed to detect homozygous exon deletions efficiently. We tested ExonDel on exome sequencing data generated from 16 breast cancer cell lines and identified both novel and known IEDs. Subsequently, we verified our findings using RNAseq and PCR technologies. Further comparisons with multiple sequencing-based CNV tools showed that ExonDel is capable of detecting unique IEDs not found by other CNV tools.

Conclusions

ExonDel is an efficient way to screen for novel and known IEDs using exome sequencing data. ExonDel and its source code can be downloaded freely at https://github.com/slzhao/ExonDel.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-332) contains supplementary material, which is available to authorized users. 相似文献

13.

BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions

Kasper D Hansen Benjamin Langmead Rafael A Irizarry 《Genome biology》2012,13(10):R83

DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth. 相似文献

14.

Predicting the Functional Effect of Amino Acid Substitutions and Indels

Yongwook Choi Gregory E. Sims Sean Murphy Jason R. Miller Agnes P. Chan 《PloS one》2012,7(10)

As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org. 相似文献

15.

PROCAIN: protein profile comparison with assisting information

Yong Wang Ruslan I. Sadreyev Nick V. Grishin 《Nucleic acids research》2009,37(11):3522-3530

Detection of remote sequence homology is essential for the accurate inference of protein structure, function and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments (MSAs) of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of ‘vertical’ MSA context (substitution constraints at individual sequence positions) and ‘horizontal’ context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models (HMMs) and secondary structure (SS) prediction. Adding SS information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from: http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php. 相似文献

16.

SNP@lincTFBS: An Integrated Database of Polymorphisms in Human LincRNA Transcription Factor Binding Sites

Shangwei Ning Zuxianglan Zhao Jingrun Ye Peng Wang Hui Zhi Ronghong Li Tingting Wang Jianjian Wang Lihua Wang Xia Li 《PloS one》2014,9(7)

相似文献

17.

A Unique,Consistent Identifier for Alternatively Spliced Transcript Variants

Alberto Riva Graziano Pesole 《PloS one》2009,4(10)

相似文献

18.

miRspring: a compact standalone research tool for analyzing miRNA-seq data

David T. Humphreys Catherine M. Suter 《Nucleic acids research》2013,41(15):e147

High-throughput sequencing for microRNA (miRNA) profiling has revealed a vast complexity of miRNA processing variants, but these are difficult to discern for those without bioinformatics expertise and large computing capability. In this article, we present miRNA Sequence Profiling (miRspring) (http://mirspring.victorchang.edu.au), a software solution that creates a small portable research document that visualizes, calculates and reports on the complexities of miRNA processing. We designed an index-compression algorithm that allows the miRspring document to reproduce a complete miRNA sequence data set while retaining a small file size (typically <3 MB). Through analysis of 73 public data sets, we demonstrate miRspring’s features in assessing quality parameters, miRNA cluster expression levels and miRNA processing. Additionally, we report on a new class of miRNA variants, which we term seed-isomiRs, identified through the novel visualization tools of the miRspring document. Further investigation identified that ∼30% of human miRBase entries are likely to have a seed-isomiR. We believe that miRspring will be a highly useful research tool that will enhance the analysis of miRNA data sets and thus increase our understanding of miRNA biology. 相似文献

19.

VAReporter: variant reporter for cancer research of massive parallel sequencing

Po-Jung Huang Chi-Ching Lee Ling-Ya Chiu Kuo-Yang Huang Yuan-Ming Yeh Chia-Yu Yang Cheng-Hsun Chiu Petrus Tang 《BMC genomics》2018,19(2):86

Background

High throughput sequencing technologies have been an increasingly critical aspect of precision medicine owing to a better identification of disease targets, which contributes to improved health care cost and clinical outcomes. In particular, disease-oriented targeted enrichment sequencing is becoming a widely-accepted application for diagnostic purposes, which can interrogate known diagnostic variants as well as identify novel biomarkers from panels of entire human coding exome or disease-associated genes.

Results

We introduce a workflow named VAReporter to facilitate the management of variant assessment in disease-targeted sequencing, the identification of pathogenic variants, the interpretation of biological effects and the prioritization of clinically actionable targets. State-of-art algorithms that account for mutation phenotypes are used to rank the importance of mutated genes through visual analytic strategies. We established an extensive annotation source by integrating a wide variety of biomedical databases and followed the American College of Medical Genetics and Genomics (ACMG) guidelines for interpretation and reporting of sequence variations.

Conclusions

In summary, VAReporter is the first web server designed to provide a “one-stop” resource for individual’s diagnosis and large-scale cohort studies, and is freely available at http://rnd.cgu.edu.tw/vareporter.

相似文献

20.

TopHat2: accurate alignment of transcriptomes in the presence of insertions,deletions and gene fusions

Daehwan Kim Geo Pertea Cole Trapnell Harold Pimentel Ryan Kelley Steven L Salzberg 《Genome biology》2013,14(4):R36

相似文献