首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Recent developments in deep (next-generation) sequencing technologies are significantly impacting medical research. The global analysis of protein coding regions in genomes of interest by whole exome sequencing is a widely used application. Many technologies for exome capture are commercially available; here we compare the performance of four of them: NimbleGen’s SeqCap EZ v3.0, Agilent’s SureSelect v4.0, Illumina’s TruSeq Exome, and Illumina’s Nextera Exome, all applied to the same human tumor DNA sample.

Results

Each capture technology was evaluated for its coverage of different exome databases, target coverage efficiency, GC bias, sensitivity in single nucleotide variant detection, sensitivity in small indel detection, and technical reproducibility. In general, all technologies performed well; however, our data demonstrated small, but consistent differences between the four capture technologies. Illumina technologies cover more bases in coding and untranslated regions. Furthermore, whereas most of the technologies provide reduced coverage in regions with low or high GC content, the Nextera technology tends to bias towards target regions with high GC content.

Conclusions

We show key differences in performance between the four technologies. Our data should help researchers who are planning exome sequencing to select appropriate exome capture technology for their particular application.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-449) contains supplementary material, which is available to authorized users.  相似文献   

2.

Background

Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.

Results

We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.

Conclusions

From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-247) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background

Exome sequencing allows researchers to study the human genome in unprecedented detail. Among the many types of variants detectable through exome sequencing, one of the most over looked types of mutation is internal deletion of exons. Internal exon deletions are the absence of consecutive exons in a gene. Such deletions have potentially significant biological meaning, and they are often too short to be considered copy number variation. Therefore, to the need for efficient detection of such deletions using exome sequencing data exists.

Results

We present ExonDel, a tool specially designed to detect homozygous exon deletions efficiently. We tested ExonDel on exome sequencing data generated from 16 breast cancer cell lines and identified both novel and known IEDs. Subsequently, we verified our findings using RNAseq and PCR technologies. Further comparisons with multiple sequencing-based CNV tools showed that ExonDel is capable of detecting unique IEDs not found by other CNV tools.

Conclusions

ExonDel is an efficient way to screen for novel and known IEDs using exome sequencing data. ExonDel and its source code can be downloaded freely at https://github.com/slzhao/ExonDel.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-332) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation.

Results

We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure.

Conclusions

Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-732) contains supplementary material, which is available to authorized users.  相似文献   

5.

Background

Animal domestication involved drastic phenotypic changes driven by strong artificial selection and also resulted in new populations of breeds, established by humans. This study aims to identify genes that show evidence of recent artificial selection during pig domestication.

Results

Whole-genome resequencing of 30 individual pigs from domesticated breeds, Landrace and Yorkshire, and 10 Asian wild boars at ~16-fold coverage was performed resulting in over 4.3 million SNPs for 19,990 genes. We constructed a comprehensive genome map of directional selection by detecting selective sweeps using an FST-based approach that detects directional selection in lineages leading to the domesticated breeds and using a haplotype-based test that detects ongoing selective sweeps within the breeds. We show that candidate genes under selection are significantly enriched for loci implicated in quantitative traits important to pig reproduction and production. The candidate gene with the strongest signals of directional selection belongs to group III of the metabolomics glutamate receptors, known to affect brain functions associated with eating behavior, suggesting that loci under strong selection include loci involved in behaviorial traits in domesticated pigs including tameness.

Conclusions

We show that a significant proportion of selection signatures coincide with loci that were previously inferred to affect phenotypic variation in pigs. We further identify functional enrichment related to behavior, such as signal transduction and neuronal activities, for those targets of selection during domestication in pigs.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1330-x) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background

Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls.

Results

This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%.

Conclusions

In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1333-7) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users.  相似文献   

8.

Background

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.

Results

Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %).

Conclusions

In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1796-6) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

Rapid and accurate retrieval of whole genome sequences of human pathogens from disease vectors or animal reservoirs will enable fine-resolution studies of pathogen epidemiological and evolutionary dynamics. However, next generation sequencing technologies have not yet been fully harnessed for the study of vector-borne and zoonotic pathogens, due to the difficulty of obtaining high-quality pathogen sequence data directly from field specimens with a high ratio of host to pathogen DNA.

Results

We addressed this challenge by using custom probes for multiplexed hybrid capture to enrich for and sequence 30 Borrelia burgdorferi genomes from field samples of its arthropod vector. Hybrid capture enabled sequencing of nearly the complete genome (~99.5 %) of the Borrelia burgdorferi pathogen with 132-fold coverage, and identification of up to 12,291 single nucleotide polymorphisms per genome.

Conclusions

The proprosed culture-independent method enables efficient whole genome capture and sequencing of pathogens directly from arthropod vectors, thus making population genomic study of vector-borne and zoonotic infectious diseases economically feasible and scalable. Furthermore, given the similarities of invertebrate field specimens to other mixed DNA templates characterized by a high ratio of host to pathogen DNA, we discuss the potential applicabilty of hybrid capture for genomic study across diverse study systems.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1634-x) contains supplementary material, which is available to authorized users.  相似文献   

10.

Background

Copy number variations (CNVs) confer significant effects on genetic innovation and phenotypic variation. Previous CNV studies in swine seldom focused on in-depth characterization of global CNVs.

Results

Using whole-genome assembly comparison (WGAC) and whole-genome shotgun sequence detection (WSSD) approaches by next generation sequencing (NGS), we probed formation signatures of both segmental duplications (SDs) and individualized CNVs in an integrated fashion, building the finest resolution CNV and SD maps of pigs so far. We obtained copy number estimates of all protein-coding genes with copy number variation carried by individuals, and further confirmed two genes with high copy numbers in Meishan pigs through an enlarged population. We determined genome-wide CNV hotspots, which were significantly enriched in SD regions, suggesting evolution of CNV hotspots may be affected by ancestral SDs. Through systematically enrichment analyses based on simulations and bioinformatics analyses, we revealed CNV-related genes undergo a different selective constraint from those CNV-unrelated regions, and CNVs may be associated with or affect pig health and production performance under recent selection.

Conclusions

Our studies lay out one way for characterization of CNVs in the pig genome, provide insight into the pig genome variation and prompt CNV mechanisms studies when using pigs as biomedical models for human diseases.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-593) contains supplementary material, which is available to authorized users.  相似文献   

11.

Background

Understanding the genetic mechanisms that underlie meat quality traits is essential to improve pork quality. To date, most quantitative trait loci (QTL) analyses have been performed on F2 crosses between outbred pig strains and have led to the identification of numerous QTL. However, because linkage disequilibrium is high in such crosses, QTL mapping precision is unsatisfactory and only a few QTL have been found to segregate within outbred strains, which limits their use to improve animal performance. To detect QTL in outbred pig populations of Chinese and Western origins, we performed genome-wide association studies (GWAS) for meat quality traits in Chinese purebred Erhualian pigs and a Western Duroc × (Landrace × Yorkshire) (DLY) commercial population.

Methods

Three hundred and thirty six Chinese Erhualian and 610 DLY pigs were genotyped using the Illumina PorcineSNP60K Beadchip and evaluated for 20 meat quality traits. After quality control, 35 985 and 56 216 single nucleotide polymorphisms (SNPs) were available for the Chinese Erhualian and DLY datasets, respectively, and were used to perform two separate GWAS. We also performed a meta-analysis that combined P-values and effects of 29 516 SNPs that were common to Erhualian, DLY, F2 and Sutai pig populations.

Results

We detected 28 and nine suggestive SNPs that surpassed the significance level for meat quality in Erhualian and DLY pigs, respectively. Among these SNPs, ss131261254 on pig chromosome 4 (SSC4) was the most significant (P = 7.97E-09) and was associated with drip loss in Erhualian pigs. Our results suggested that at least two QTL on SSC12 and on SSC15 may have pleiotropic effects on several related traits. All the QTL that were detected by GWAS were population-specific, including 12 novel regions. However, the meta-analysis revealed seven novel QTL for meat characteristics, which suggests the existence of common underlying variants that may differ in frequency across populations. These QTL regions contain several relevant candidate genes.

Conclusions

These findings provide valuable insights into the molecular basis of convergent evolution of meat quality traits in Chinese and Western breeds that show divergent phenotypes. They may contribute to genetic improvement of purebreds for crossbred performance.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0120-x) contains supplementary material, which is available to authorized users.  相似文献   

12.
13.

Background

Mangalicas are fatty type local/rare pig breeds with an increasing presence in the niche pork market in Hungary and in other countries. To explore their genetic resources, we have analysed data from next-generation sequencing of an individual male from each of three Mangalica breeds along with a local male Duroc pig. Structural variations, such as SNPs, INDELs and CNVs, were identified and particular genes with SNP variations were analysed with special emphasis on functions related to fat metabolism in pigs.

Results

More than 60 Gb of sequence data were generated for each of the sequenced individuals, resulting in 11× to 19× autosomal median coverage. After stringent filtering, around six million SNPs, of which approximately 10% are novel compared to the dbSNP138 database, were identified in each animal. Several hundred thousands of INDELs and about 1,000 CNV gains were also identified. The functional annotation of genes with exonic, non-synonymous SNPs, which are common in all three Mangalicas but are absent in either the reference genome or the sequenced Duroc of this study, highlighted 52 genes in lipid metabolism processes. Further analysis revealed that 41 of these genes are associated with lipid metabolic or regulatory pathways, 49 are in fat-metabolism and fatness-phenotype QTLs and, with the exception of ACACA, ANKRD23, GM2A, KIT, MOGAT2, MTTP, FASN, SGMS1, SLC27A6 and RETSAT, have not previously been associated with fat-related phenotypes.

Conclusions

Genome analysis of Mangalica breeds revealed that local/rare breeds could be a rich source of sequence variations not present in cosmopolitan/industrial breeds. The identified Mangalica variations may, therefore, be a very useful resource for future studies of agronomically important traits in pigs.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-761) contains supplementary material, which is available to authorized users.  相似文献   

14.
15.

Background

Deidentified newborn screening bloodspot samples (NBS) represent a valuable potential resource for genomic research if impediments to whole exome sequencing of NBS deoxyribonucleic acid (DNA), including the small amount of genomic DNA in NBS material, can be overcome. For instance, genomic analysis of NBS could be used to define allele frequencies of disease-associated variants in local populations, or to conduct prospective or retrospective studies relating genomic variation to disease emergence in pediatric populations over time. In this study, we compared the recovery of variant calls from exome sequences of amplified NBS genomic DNA to variant calls from exome sequencing of non-amplified NBS DNA from the same individuals.

Results

Using a standard alignment-based Genome Analysis Toolkit (GATK), we find 62,000–76,000 additional variants in amplified samples. After application of a unique kmer enumeration and variant detection method (RUFUS), only 38,000–47,000 additional variants are observed in amplified gDNA. This result suggests that roughly half of the amplification-introduced variants identified using GATK may be the result of mapping errors and read misalignment.

Conclusions

Our results show that it is possible to obtain informative, high-quality data from exome analysis of whole genome amplified NBS with the important caveat that different data generation and analysis methods can affect variant detection accuracy, and the concordance of variant calls in whole-genome amplified and non-amplified exomes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1747-2) contains supplementary material, which is available to authorized users.  相似文献   

16.

Background

In the 1980s, Korean native black pigs from Jeju Island (Jeju black pigs) served as representative sample of Korean native black pigs, and efforts were made to help the species rebound from the brink of extinction, which occurred as a result of the introduction of Western pig breeds. Geographical separation of Jeju Island from the Korean peninsula has allowed Jeju black pigs not only to acquire unique characteristics but also to retain merits of rare Korean native black pigs.

Results

To further analyze the Jeju black pig genome, we performed whole-genome re-sequencing (average read depth of 14×) of 8 Jeju black pig and 6 Korean pigs (which live on the Korean peninsula) to compare and identify putative signatures of positive selection in Jeju black pig, the true and pure Korean native black pigs. The candidate genes potentially under positive selection in Jeju black pig support previous reports of high marbling score, rare occurrence of pale, soft, exudative (PSE) meat, but low growth rate and carcass weight compared to Western breeds.

Conclusions

Several candidate genes potentially under positive selection were involved in fatty acid transport and may have contributed to the unique characteristics of meat quality in JBP. Jeju black pigs can offer a unique opportunity to investigate the true genetic resource of once endangered Korean native black pigs. Further genome-wide analyses of Jeju black pigs on a larger population scale are required in order to define a conservation strategy and improvement of native pig resources.

Electronic supplementary material

The online version of this article (doi:10.1186/s12863-014-0160-1) contains supplementary material, which is available to authorized users.  相似文献   

17.
18.
19.

Background

With advances in next generation sequencing technologies and genomic capture techniques, exome sequencing has become a cost-effective approach for mutation detection in genetic diseases. However, computational prediction of copy number variants (CNVs) from exome sequence data is a challenging task. Whilst numerous programs are available, they have different sensitivities, and have low sensitivity to detect smaller CNVs (1–4 exons). Additionally, exonic CNV discovery using standard aCGH has limitations due to the low probe density over exonic regions. The goal of our study was to develop a protocol to detect exonic CNVs (including shorter CNVs that cover 1–4 exons), combining computational prediction algorithms and a high-resolution custom CGH array.

Results

We used six published CNV prediction programs (ExomeCNV, CONTRA, ExomeCopy, ExomeDepth, CoNIFER, XHMM) and an in-house modification to ExomeCopy and ExomeDepth (ExCopyDepth) for computational CNV prediction on 30 exomes from the 1000 genomes project and 9 exomes from primary immunodeficiency patients. CNV predictions were tested using a custom CGH array designed to capture all exons (exaCGH). After this validation, we next evaluated the computational prediction of shorter CNVs. ExomeCopy and the in-house modified algorithm, ExCopyDepth, showed the highest capability in detecting shorter CNVs. Finally, the performance of each computational program was assessed by calculating the sensitivity and false positive rate.

Conclusions

In this paper, we assessed the ability of 6 computational programs to predict CNVs, focussing on short (1–4 exon) CNVs. We also tested these predictions using a custom array targeting exons. Based on these results, we propose a protocol to identify and confirm shorter exonic CNVs combining computational prediction algorithms and custom aCGH experiments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-661) contains supplementary material, which is available to authorized users.  相似文献   

20.

Background

Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data.

Results

We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing.

Conclusions

Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1172) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号