共查询到20条相似文献,搜索用时 514 毫秒
1.
Pubudu Saneth Samarakoon Hanne S?rmo Sorte Bj?rn Evert Kristiansen Tove Skodje Ying Sheng Geir E Tj?nnfjord Barbro Stadheim Asbj?rg Stray-Pedersen Olaug Kristin R?dningen Robert Lyle 《BMC genomics》2014,15(1)
Background
With advances in next generation sequencing technologies and genomic capture techniques, exome sequencing has become a cost-effective approach for mutation detection in genetic diseases. However, computational prediction of copy number variants (CNVs) from exome sequence data is a challenging task. Whilst numerous programs are available, they have different sensitivities, and have low sensitivity to detect smaller CNVs (1–4 exons). Additionally, exonic CNV discovery using standard aCGH has limitations due to the low probe density over exonic regions. The goal of our study was to develop a protocol to detect exonic CNVs (including shorter CNVs that cover 1–4 exons), combining computational prediction algorithms and a high-resolution custom CGH array.Results
We used six published CNV prediction programs (ExomeCNV, CONTRA, ExomeCopy, ExomeDepth, CoNIFER, XHMM) and an in-house modification to ExomeCopy and ExomeDepth (ExCopyDepth) for computational CNV prediction on 30 exomes from the 1000 genomes project and 9 exomes from primary immunodeficiency patients. CNV predictions were tested using a custom CGH array designed to capture all exons (exaCGH). After this validation, we next evaluated the computational prediction of shorter CNVs. ExomeCopy and the in-house modified algorithm, ExCopyDepth, showed the highest capability in detecting shorter CNVs. Finally, the performance of each computational program was assessed by calculating the sensitivity and false positive rate.Conclusions
In this paper, we assessed the ability of 6 computational programs to predict CNVs, focussing on short (1–4 exon) CNVs. We also tested these predictions using a custom array targeting exons. Based on these results, we propose a protocol to identify and confirm shorter exonic CNVs combining computational prediction algorithms and custom aCGH experiments.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-661) contains supplementary material, which is available to authorized users. 相似文献2.
3.
Determining the quality and complexity of next-generation sequencing data without a reference genome
Seyed Yahya Anvar Lusine Khachatryan Martijn Vermaat Michiel van Galen Irina Pulyakhina Yavuz Ariyurek Ken Kraaijeveld Johan T den Dunnen Peter de Knijff Peter AC ’t Hoen Jeroen FJ Laros 《Genome biology》2014,15(12)
We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0555-3) contains supplementary material, which is available to authorized users. 相似文献4.
5.
Daniel Paulino René L. Warren Benjamin P. Vandervalk Anthony Raymond Shaun D. Jackman Inan? Birol 《BMC bioinformatics》2015,16(1)
Background
While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes.Results
Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8 % and 13.8 % of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively – a feat that is not possible with other leading tools with the breadth of data used in our study.Conclusion
Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0663-4) contains supplementary material, which is available to authorized users. 相似文献6.
Kaushalya C Amarasinghe Jason Li Sally M Hunter Georgina L Ryland Prue A Cowin Ian G Campbell Saman K Halgamuge 《BMC genomics》2014,15(1)
Background
Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation.Results
We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure.Conclusions
Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-732) contains supplementary material, which is available to authorized users. 相似文献7.
Background
Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging.Results
To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms.Conclusion
Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users. 相似文献8.
Yi Qiao Aaron R Quinlan Amir A Jazaeri Roeland GW Verhaak David A Wheeler Gabor T Marth 《Genome biology》2014,15(8)
Many tumors are composed of genetically divergent cell subpopulations. We report SubcloneSeeker, a package capable of exhaustive identification of subclone structures and evolutionary histories with bulk somatic variant allele frequency measurements from tumor biopsies. We present a statistical framework to elucidate whether specific sets of mutations are present within the same subclones, and the order in which they occur. We demonstrate how subclone reconstruction provides crucial information about tumorigenesis and relapse mechanisms; guides functional study by variant prioritization, and has the potential as a rational basis for informed therapeutic strategies for the patient. SubcloneSeeker is available at: https://github.com/yiq/SubcloneSeeker.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0443-x) contains supplementary material, which is available to authorized users. 相似文献9.
So-Young Bang Young-Ji Na Kwangwoo Kim Young Bin Joo Youngho Park Jaemoon Lee Sun-Young Lee Adnan A Ansari Junghee Jung Hwanseok Rhee Jong-Young Lee Bok-Ghee Han Sung-Min Ahn Sungho Won Hye-Soon Lee Sang-Cheol Bae 《Arthritis research & therapy》2014,16(5)
Introduction
Although it has been suggested that rare coding variants could explain the substantial missing heritability, very few sequencing studies have been performed in rheumatoid arthritis (RA). We aimed to identify novel functional variants with rare to low frequency using targeted exon sequencing of RA in Korea.Methods
We analyzed targeted exon sequencing data of 398 genes selected from a multifaceted approach in Korean RA patients (n = 1,217) and controls (n = 717). We conducted a single-marker association test and a gene-based analysis of rare variants. For meta-analysis or enrichment tests, we also used ethnically matched independent samples of Korean genome-wide association studies (GWAS) (n = 4,799) or immunochip data (n = 4,722).Results
After stringent quality control, we analyzed 10,588 variants of 398 genes from 1,934 Korean RA case controls. We identified 13 nonsynonymous variants with nominal association in single-variant association tests. In a meta-analysis, we did not find any novel variant with genome-wide significance for RA risk. Using a gene-based approach, we identified 17 genes with nominal burden signals. Among them, VSTM1 showed the greatest association with RA (P = 7.80 × 10−4). In the enrichment test using Korean GWAS, although the significant signal appeared to be driven by total genic variants, we found no evidence for enriched association of coding variants only with RA.Conclusions
We were unable to identify rare coding variants with large effect to explain the missing heritability for RA in the current targeted resequencing study. Our study raises skepticism about exon sequencing of targeted genes for complex diseases like RA.Electronic supplementary material
The online version of this article (doi:10.1186/s13075-014-0447-7) contains supplementary material, which is available to authorized users. 相似文献10.
MiRNAs play important roles in many diseases including cancers. However computational prediction of miRNA target genes is challenging and the accuracies of existing methods remain poor. We report mirMark, a new machine learning-based method of miRNA target prediction at the site and UTR levels. This method uses experimentally verified miRNA targets from miRecords and mirTarBase as training sets and considers over 700 features. By combining Correlation-based Feature Selection with a variety of statistical or machine learning methods for the site- and UTR-level classifiers, mirMark significantly improves the overall predictive performance compared to existing publicly available methods. MirMark is available from https://github.com/lanagarmire/MirMark.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0500-5) contains supplementary material, which is available to authorized users. 相似文献11.
12.
Background
First pass methods based on BLAST match are commonly used as an initial step to separate the different phylogenetic histories of genes in microbial genomes, and target putative horizontal gene transfer (HGT) events. This will continue to be necessary given the rapid growth of genomic data and the technical difficulties in conducting large-scale explicit phylogenetic analyses. However, these methods often produce misleading results due to their inability to resolve indirect phylogenetic links and their vulnerability to stochastic events.Results
A new computational method of rapid, exhaustive and genome-wide detection of HGT was developed, featuring the systematic analysis of BLAST hit distribution patterns in the context of a priori defined hierarchical evolutionary categories. Genes that fall beyond a series of statistically determined thresholds are identified as not adhering to the typical vertical history of the organisms in question, but instead having a putative horizontal origin. Tests on simulated genomic data suggest that this approach effectively targets atypically distributed genes that are highly likely to be HGT-derived, and exhibits robust performance compared to conventional BLAST-based approaches. This method was further tested on real genomic datasets, including Rickettsia genomes, and was compared to previous studies. Results show consistency with currently employed categories of HGT prediction methods. In-depth analysis of both simulated and real genomic data suggests that the method is notably insensitive to stochastic events such as gene loss, rate variation and database error, which are common challenges to the current methodology. An automated pipeline was created to implement this approach and was made publicly available at: https://github.com/DittmarLab/HGTector. The program is versatile, easily deployed, has a low requirement for computational resources.Conclusions
HGTector is an effective tool for initial or standalone large-scale discovery of candidate HGT-derived genes.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-717) contains supplementary material, which is available to authorized users. 相似文献13.
Background
With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology.Results
In this paper we address the problem of rapid taxonomic assignment with small and adaptive data models (< 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE’s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. We report classification accuracy reasonably well for ranks down to order, observed on a study on real world data (Acid Mine Drainage, Cow Rumen).Conclusion
We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable. The tool is presented to the public as a web application (url: https://ani.cebitec.uni-bielefeld.de/ake/, username: bmc, password: bmcbioinfo).Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0384-0) contains supplementary material, which is available to authorized users. 相似文献14.
Background
Next generation sequencing (NGS) offers a rapid and comprehensive method of screening for mutations associated with retinitis pigmentosa and related disorders. However, certain sequence alterations such as large insertions or deletions may remain undetected using standard NGS pipelines. One such mutation is a recently-identified Alu insertion into the Male Germ Cell-Associated Kinase (MAK) gene, which is missed by standard NGS-based variant callers. Here, we developed an in silico method of searching NGS raw sequence reads to detect this mutation, without the need to recalculate sequence alignments or to screen every sample by PCR.Methods
The Linux program grep was used to search for a 23 bp “probe” sequence containing the known junction sequence of the insert. A corresponding search was performed with the wildtype sequence. The matching reads were counted and further compared to the known sequences of the full wildtype and mutant genomic loci. (See https://github.com/MEEIBioinformaticsCenter/grepsearch.)Results
In a test sample set consisting of eleven previously published homozygous mutants, detection of the MAK-Alu insertion was validated with 100% sensitivity and specificity. As a discovery cohort, raw NGS reads from 1,847 samples (including custom and whole exome selective capture) were searched in ~1 hour on a local computer cluster, yielding an additional five samples with MAK-Alu insertions and solving two previously unsolved pedigrees. Of these, one patient was homozygous for the insertion, one compound heterozygous with a missense change on the other allele (c. 46G>A; p.Gly16Arg), and three were heterozygous carriers.Conclusions
Using the MAK-Alu grep program proved to be a rapid and effective method of finding a known, disease-causing Alu insertion in a large cohort of patients with NGS data. This simple approach avoids wet-lab assays or computationally expensive algorithms, and could also be used for other known disease-causing insertions and deletions. 相似文献15.
Velina Kozareva Clayton Stroff Maxwell Silver Jonathan F. Freidin Nigel F. Delaney 《BMC medical genomics》2018,11(1):91
Background
Detection of copy number variants (CNVs) is an important aspect of clinical testing for several disorders, including Duchenne muscular dystrophy, and is often performed using multiplex ligation-dependent probe amplification (MLPA). However, since many genetic carrier screens depend instead on next-generation sequencing (NGS) for wider discovery of small variants, they often do not include CNV analysis. Moreover, most computational techniques developed to detect CNVs from exome sequencing data are not suitable for carrier screening, as they require matched normals, very large cohorts, or extensive gene panels.Methods
We present a computational software package, geneCNV (http://github.com/vkozareva/geneCNV), which can identify exon-level CNVs using exome sequencing data from only a few genes. The tool relies on a hierarchical parametric model trained on a small cohort of reference samples.Results
Using geneCNV, we accurately inferred heterozygous CNVs in the DMD gene across a cohort of 15 test subjects. These results were validated against MLPA, the current standard for clinical CNV analysis in DMD. We also benchmarked the tool’s performance against other computational techniques and found comparable or improved CNV detection in DMD using data from panels ranging from 4,000 genes to as few as 8 genes.Conclusions
geneCNV allows for the creation of cost-effective screening panels by allowing NGS sequencing approaches to generate results equivalent to bespoke genotyping assays like MLPA. By using a parametric model to detect CNVs, it also fulfills regulatory requirements to define a reference range for a genetic test. It is freely available and can be incorporated into any Illumina sequencing pipeline to create clinical assays for detection of exon duplications and deletions.16.
17.
Vadim I. Nazarov Mikhail V. Pogorelyy Ekaterina A. Komech Ivan V. Zvyagin Dmitry A. Bolotin Mikhail Shugay Dmitry M. Chudakov Yury B. Lebedev Ilgar Z. Mamedov 《BMC bioinformatics》2015,16(1)
Background
The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is required for the rational analysis of massive data generated by next-generation sequencing.Results
Here we introduce tcR, a new R package, representing a platform for the advanced analysis of T cell receptor repertoires, which includes diversity measures, shared T cell receptor sequences identification, gene usage statistics computation and other widely used methods. The tool has proven its utility in recent research studies.Conclusions
tcR is an R package for the advanced analysis of T cell receptor repertoires after primary TR sequences extraction from raw sequencing reads. The stable version can be directly installed from The Comprehensive R Archive Network (http://cran.r-project.org/mirrors.html). The source code and development version are available at tcR GitHub (http://imminfo.github.io/tcr/) along with the full documentation and typical usage examples. 相似文献18.
Background
Comparing and aligning genomes is a key step in analyzing closely related genomes. Despite the development of many genome aligners in the last 15 years, the problem is not yet fully resolved, even when aligning closely related bacterial genomes of the same species. In addition, no procedures are available to assess the quality of genome alignments or to compare genome aligners.Results
We designed an original method for pairwise genome alignment, named YOC, which employs a highly sensitive similarity detection method together with a recent collinear chaining strategy that allows overlaps. YOC improves the reliability of collinear genome alignments, while preserving or even improving sensitivity. We also propose an original qualitative evaluation criterion for measuring the relevance of genome alignments. We used this criterion to compare and benchmark YOC with five recent genome aligners on large bacterial genome datasets, and showed it is suitable for identifying the specificities and the potential flaws of their underlying strategies.Conclusions
The YOC prototype is available at https://github.com/ruricaru/YOC. It has several advantages over existing genome aligners: (1) it is based on a simplified two phase alignment strategy, (2) it is easy to parameterize, (3) it produces reliable genome alignments, which are easier to analyze and to use.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0530-3) contains supplementary material, which is available to authorized users. 相似文献19.
Ikuko N Motoike Mitsuyo Matsumoto Inaho Danjoh Fumiki Katsuoka Kaname Kojima Naoki Nariai Yukuto Sato Yumi Yamaguchi-Kabata Shin Ito Hisaaki Kudo Ichiko Nishijima Satoshi Nishikawa Xiaoqing Pan Rumiko Saito Sakae Saito Tomo Saito Matsuyuki Shirota Kaoru Tsuda Junji Yokozawa Kazuhiko Igarashi Naoko Minegishi Osamu Tanabe Nobuo Fuse Masao Nagasaki Kengo Kinoshita Jun Yasuda Masayuki Yamamoto 《BMC genomics》2014,15(1)