首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.

Results

We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.

Conclusions

Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons.  相似文献   

2.
Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.  相似文献   

3.
Here we present an adaptation of NimbleGen 2.1M-probe array sequence capture for whole exome sequencing using the Illumina Genome Analyzer (GA) platform. The protocol involves two-stage library construction. The specificity of exome enrichment was approximately 80% with 95.6% even coverage of the 34 Mb target region at an average sequencing depth of 33-fold. Comparison of our results with whole genome shot-gun resequencing results showed that the exome SNP calls gave only 0.97% false positive and 6.27% false negative variants. Our protocol is also well suited for use with whole genome amplified DNA. The results presented here indicate that there is a promising future for large-scale population genomics and medical studies using a whole exome sequencing approach.  相似文献   

4.
Asan  Xu Y  Jiang H  Tyler-Smith C  Xue Y  Jiang T  Wang J  Wu M  Liu X  Tian G  Wang J  Wang J  Yang H  Zhang X 《Genome biology》2011,12(9):R95-12

Background

Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.

Results

We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.

Conclusions

We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.  相似文献   

5.
Over the next few years, the efficient use of next-generation sequencing (NGS) in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or ∼180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of ≥3, 86% at a read depth of ≥10, and over 50% of all targets were covered with ≥20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at ≥10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered ≥8x. Our results offer guidance for “real-world” applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments.  相似文献   

6.
Isolating high-priority segments of genomes greatly enhances the efficiency of next-generation sequencing (NGS) by allowing researchers to focus on their regions of interest. For the 2010–11 DNA Sequencing Research Group (DSRG) study, we compared outcomes from two leading companies, Agilent Technologies (Santa Clara, CA, USA) and Roche NimbleGen (Madison, WI, USA), which offer custom-targeted genomic enrichment methods. Both companies were provided with the same genomic sample and challenged to capture identical genomic locations for DNA NGS. The target region totaled 3.5 Mb and included 31 individual genes and a 2-Mb contiguous interval. Each company was asked to design its best assay, perform the capture in replicates, and return the captured material to the DSRG-participating laboratories. Sequencing was performed in two different laboratories on Genome Analyzer IIx systems (Illumina, San Diego, CA, USA). Sequencing data were analyzed for sensitivity, specificity, and coverage of the desired regions. The success of the enrichment was highly dependent on the design of the capture probes. Overall, coverage variability was higher for the Agilent samples. As variant discovery is the ultimate goal for a typical targeted sequencing project, we compared samples for their ability to sequence single-nucleotide polymorphisms (SNPs) as a test of the ability to capture both chromosomes from the sample. In the targeted regions, we detected 2546 SNPs with the NimbleGen samples and 2071 with Agilent''s. When limited to the regions that both companies included as baits, the number of SNPs was ∼1000 for each, with Agilent and NimbleGen finding a small number of unique SNPs not found by the other.  相似文献   

7.
Molecular diagnosis of monogenic diabetes and obesity is of paramount importance for both the patient and society, as it can result in personalized medicine associated with a better life and it eventually saves health care spending. Genetic clinical laboratories are currently switching from Sanger sequencing to next-generation sequencing (NGS) approaches but choosing the optimal protocols is not easy. Here, we compared the sequencing coverage of 43 genes involved in monogenic forms of diabetes and obesity, and variant detection rates, resulting from four enrichment methods based on the sonication of DNA (Agilent SureSelect, RainDance technologies), or using enzymes for DNA fragmentation (Illumina Nextera, Agilent HaloPlex). We analyzed coding exons and untranslated regions of the 43 genes involved in monogenic diabetes and obesity. We found that none of the methods achieves yet full sequencing of the gene targets. Nonetheless, the RainDance, SureSelect and HaloPlex enrichment methods led to the best sequencing coverage of the targets; while the Nextera method resulted in the poorest sequencing coverage. Although the sequencing coverage was high, we unexpectedly found that the HaloPlex method missed 20% of variants detected by the three other methods and Nextera missed 10%. The question of which NGS technique for genetic diagnosis yields the highest diagnosis rate is frequently discussed in the literature and the response is still unclear. Here, we showed that the RainDance enrichment method as well as SureSelect, which are both based on the sonication of DNA, resulted in a good sequencing quality and variant detection, while the use of enzymes to fragment DNA (HaloPlex or Nextera) might not be the best strategy to get an accurate sequencing.  相似文献   

8.
外显子组测序是针对基因组中的蛋白质编码区,靶向富集外显子区域测序,以发现疾病相关遗传变异的技术。该技术近年越来越多地应用于发现人类基因组低频变异、鉴定单基因遗传病致病基因和肿瘤等复杂疾病易感基因研究,成为人类疾病相关变异研究的重要工具。综述了外显子组测序技术的基本原理及其在人类疾病相关基因研究中的应用。  相似文献   

9.
To date we have little knowledge of how accurate next-generation sequencing (NGS) technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites) and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing) and short-read (Illumina) NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non-uniform sequence coverage, which, as our study suggests, may affect some types of tandem repeats more than others.  相似文献   

10.

Background

Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.

Results

We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.

Conclusions

From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-247) contains supplementary material, which is available to authorized users.  相似文献   

11.
Despite the ever-increasing throughput and steadily decreasing cost of next generation sequencing (NGS), whole genome sequencing of humans is still not a viable option for the majority of genetics laboratories. This is particularly true in the case of complex disease studies, where large sample sets are often required to achieve adequate statistical power. To fully leverage the potential of NGS technology on large sample sets, several methods have been developed to selectively enrich for regions of interest. Enrichment reduces both monetary and computational costs compared to whole genome sequencing, while allowing researchers to take advantage of NGS throughput. Several targeted enrichment approaches are currently available, including molecular inversion probe ligation sequencing (MIPS), oligonucleotide hybridization based approaches, and PCR-based strategies. To assess how these methods performed when used in conjunction with the ABI SOLID3+, we investigated three enrichment techniques: Nimblegen oligonucleotide hybridization array-based capture; Agilent SureSelect oligonucleotide hybridization solution-based capture; and Raindance Technologies' multiplexed PCR-based approach. Target regions were selected from exons and evolutionarily conserved areas throughout the human genome. Probe and primer pair design was carried out for all three methods using their respective informatics pipelines. In all, approximately 0.8 Mb of target space was identical for all 3 methods. SOLiD sequencing results were analyzed for several metrics, including consistency of coverage depth across samples, on-target versus off-target efficiency, allelic bias, and genotype concordance with array-based genotyping data. Agilent SureSelect exhibited superior on-target efficiency and correlation of read depths across samples. Nimblegen performance was similar at read depths at 20× and below. Both Raindance and Nimblegen SeqCap exhibited tighter distributions of read depth around the mean, but both suffered from lower on-target efficiency in our experiments. Raindance demonstrated the highest versatility in assay design.  相似文献   

12.
Usher syndrome (USH) is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS) technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II) and Roche 454 (GS FLX) for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous) out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified.  相似文献   

13.
Recent advancements of sequencing technology have opened up unprecedented opportunities in many application areas. Virus samples can now be sequenced efficiently with very deep coverage to infer the genetic diversity of the underlying virus populations. Several sequencing platforms with different underlying technologies and performance characteristics are available for viral diversity studies. Here, we investigate how the differences between two common platforms provided by 454/Roche and Illumina affect viral diversity estimation and the reconstruction of viral haplotypes. Using a mixture of ten HIV clones sequenced with both platforms and additional simulation experiments, we assessed the trade-off between sequencing coverage, read length, and error rate. For fixed costs, short Illumina reads can be generated at higher coverage and allow for detecting variants at lower frequencies. They can also be sufficient to assess the diversity of the sample if sequences are dissimilar enough, but, in general, assembly of full-length haplotypes is feasible only with the longer 454/Roche reads. The quantitative comparison highlights the advantages and disadvantages of both platforms and provides guidance for the design of viral diversity studies.  相似文献   

14.
The molecular diagnosis of muscle disorders is challenging: genetic heterogeneity (>100 causal genes for skeletal and cardiac muscle disease) precludes exhaustive clinical testing, prioritizing sequencing of specific genes is difficult due to the similarity of clinical presentation, and the number of variants returned through exome sequencing can make the identification of the disease-causing variant difficult. We have filtered variants found through exome sequencing by prioritizing variants in genes known to be involved in muscle disease while examining the quality and depth of coverage of those genes. We ascertained two families with autosomal dominant limb-girdle muscular dystrophy of unknown etiology. To identify the causal mutations in these families, we performed exome sequencing on five affected individuals using the Agilent SureSelect Human All Exon 50 Mb kit and the Illumina HiSeq 2000 (2×100 bp). We identified causative mutations in desmin (IVS3+3A>G) and filamin C (p.W2710X), and augmented the phenotype data for individuals with muscular dystrophy due to these mutations. We also discuss challenges encountered due to depth of coverage variability at specific sites and the annotation of a functionally proven splice site variant as an intronic variant.  相似文献   

15.
Sequence capture methods for targeted next generation sequencing promise to massively reduce cost of genomics projects compared to untargeted sequencing. However, evaluated capture methods specifically dedicated to biologically relevant genomic regions are rare. Whole exome capture has been shown to be a powerful tool to discover the genetic origin of disease and provides a reduction in target size and thus calculative sequencing capacity of > 90-fold compared to untargeted whole genome sequencing. For further cost reduction, a valuable complementing approach is the analysis of smaller, relevant gene subsets but involving large cohorts of samples. However, effective adjustment of target sizes and sample numbers is hampered by the limited scalability of enrichment systems. We report a highly scalable and automated method to capture a 480 Kb exome subset of 115 cancer-related genes using microfluidic DNA arrays. The arrays are adaptable from 125 Kb to 1 Mb target size and/or one to eight samples without barcoding strategies, representing a further 26 – 270-fold reduction of calculative sequencing capacity compared to whole exome sequencing. Illumina GAII analysis of a HapMap genome enriched for this exome subset revealed a completeness of > 96%. Uniformity was such that > 68% of exons had at least half the median depth of coverage. An analysis of reference SNPs revealed a sensitivity of up to 93% and a specificity of 98.2% or higher.  相似文献   

16.

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users.  相似文献   

17.
Chen D  Zhang W  Zhu ZD  Huang Y  Wang P  Zhou BB  Yang XN  Xiao HS  Zhang QH 《遗传》2010,32(12):1296-1303
文章旨在建立一种基因组目标靶序列捕捉文库的方法,并结合第二代测序技术,以实现候选基因区段的深度测序。利用Agilent公司的eArray在线平台,对1250个基因的11824个外显子共2414977bp的基因组序列进行120个碱基长度的捕捉探针(钓饵)设计,并制备成SureSelect液相靶序列捕获试剂。选用2例人基因组DNA,超声打断后末端补平并磷酸化,连接SOLiD接头,回收150bp~200bp的DNA片段,与靶序列探针杂交捕获目标序列,油包水微乳滴PCR扩增后,磁珠分离富集,上SOLiD测序系统通过工作流程分析(WFA)进行文库质量的评价,或正式测序反应。结果显示对所包含的11147个基因外显子片段设计出并合成了46509个捕捉探针,制备成SureSelect试剂盒。探针可有效地捕捉并富集基因组DNA的目标靶片段,定量PCR显示富集效率可达29倍。WFA分析表明文库可以在SOLiD仪器进行正式测序。测序结果显示靶序列区域的测序数占有效总测序数的比例达到70%,覆盖率均在200×以上。结果表明本研究所建立的SureSelect基因组靶序列捕捉、富集建立测序文库的技术路线可行,可直接用于SOLiD测序仪的测序。  相似文献   

18.
Identification of the pathogenic mutations underlying autosomal recessive nonsyndromic hearing loss (ARNSHL) is difficult, since causative mutations in 39 different genes have so far been reported. After excluding mutations in the most common ARNSHL gene, GJB2, via Sanger sequencing, we performed whole-exome sequencing (WES) in 30 individuals from 20 unrelated multiplex consanguineous families with ARNSHL. Agilent SureSelect Human All Exon 50 Mb kits and an Illumina Hiseq2000 instrument were used. An average of 93%, 84% and 73% of bases were covered to 1X, 10X and 20X within the ARNSHL-related coding RefSeq exons, respectively. Uncovered regions with WES included those that are not targeted by the exome capture kit and regions with high GC content. Twelve homozygous mutations in known deafness genes, of which eight are novel, were identified in 12 families: MYO15A-p.Q1425X, -p.S1481P, -p.A1551D; LOXHD1-p.R1494X, -p.E955X; GIPC3-p.H170N; ILDR1-p.Q274X; MYO7A-p.G2163S; TECTA-p.Y1737C; TMC1-p.S530X; TMPRSS3-p.F13Lfs*10; TRIOBP-p.R785Sfs*50. Each mutation was within a homozygous run documented via WES. Sanger sequencing confirmed co-segregation of the mutation with deafness in each family. Four rare heterozygous variants, predicted to be pathogenic, in known deafness genes were detected in 12 families where homozygous causative variants were already identified. Six heterozygous variants that had similar characteristics to those abovementioned variants were present in 15 ethnically-matched individuals with normal hearing. Our results show that rare causative mutations in known ARNSHL genes can be reliably identified via WES. The excess of heterozygous variants should be considered during search for causative mutations in ARNSHL genes, especially in small-sized families.  相似文献   

19.
Specific HLA genotypes are known to be linked to either resistance or susceptibility to certain diseases or sensitivity to certain drugs. In addition, high accuracy HLA typing is crucial for organ and bone marrow transplantation. The most widespread high resolution HLA typing method used to date is Sanger sequencing based typing (SBT), and next generation sequencing (NGS) based HLA typing is just starting to be adopted as a higher throughput, lower cost alternative. By HLA typing the HapMap subset of the public 1000 Genomes paired Illumina data, we demonstrate that HLA-A, B and C typing is possible from exome sequencing samples with higher than 90% accuracy. The older 1000 Genomes whole genome sequencing read sets are less reliable and generally unsuitable for the purpose of HLA typing. We also propose using coverage % (the extent of exons covered) as a quality check (QC) measure to increase reliability.  相似文献   

20.
《Genomics》2020,112(2):1437-1443
BackgroundWhole Exome Sequencing (WES) utilises overlapping fragments prone to sequencing artefacts. Saliva, a non-invasive source of DNA, has been successfully used in WES studies on various platforms. This study explored the validity and quality of DNA sourced from saliva compared to whole blood on an Ion Platform.MethodsDNA was extracted from both sample types from four individuals. WES, performed on the Ion Proton platform was assessed for quality metrics (Depth, Genotyping Quality, etc.) and variant identification for the same source sample-pairs.ResultsNo significant differences in quality metrics were identified between data obtained from whole blood and saliva samples, with several saliva samples demonstrating higher coverage depth. Variants within the same sample, from the two genomic DNA sources, had an average concordance similar to other studies and platforms with different chemistry.ConclusionSaliva-extracted DNA provides comparable sequencing quality to whole blood for WES on Ion Torrent Platforms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号