首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.

Results

We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.

Conclusions

Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons.  相似文献   

2.

Background

Recent developments in deep (next-generation) sequencing technologies are significantly impacting medical research. The global analysis of protein coding regions in genomes of interest by whole exome sequencing is a widely used application. Many technologies for exome capture are commercially available; here we compare the performance of four of them: NimbleGen’s SeqCap EZ v3.0, Agilent’s SureSelect v4.0, Illumina’s TruSeq Exome, and Illumina’s Nextera Exome, all applied to the same human tumor DNA sample.

Results

Each capture technology was evaluated for its coverage of different exome databases, target coverage efficiency, GC bias, sensitivity in single nucleotide variant detection, sensitivity in small indel detection, and technical reproducibility. In general, all technologies performed well; however, our data demonstrated small, but consistent differences between the four capture technologies. Illumina technologies cover more bases in coding and untranslated regions. Furthermore, whereas most of the technologies provide reduced coverage in regions with low or high GC content, the Nextera technology tends to bias towards target regions with high GC content.

Conclusions

We show key differences in performance between the four technologies. Our data should help researchers who are planning exome sequencing to select appropriate exome capture technology for their particular application.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-449) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background

Aortopathies are a group of disorders characterized by aneurysms, dilation, and tortuosity of the aorta. Because of the phenotypic overlap and genetic heterogeneity of diseases featuring aortopathy, molecular testing is often required for timely and correct diagnosis of affected individuals. In this setting next generation sequencing (NGS) offers several advantages over traditional molecular techniques.

Methods

The purpose of our study was to compare NGS enrichment methods for a clinical assay targeting the nine genes known to be associated with aortopathy. RainDance emulsion PCR and SureSelect RNA-bait hybridization capture enrichment methods were directly compared by enriching DNA from eight samples. Enriched samples were barcoded, pooled, and sequenced on the Illumina HiSeq2000 platform. Depth of coverage, consistency of coverage across samples, and the overlap of variants identified were assessed. This data was also compared to whole-exome sequencing data from ten individuals.

Results

Read depth was greater and less variable among samples that had been enriched using the RNA-bait hybridization capture enrichment method. In addition, samples enriched by hybridization capture had fewer exons with mean coverage less than 10, reducing the need for followup Sanger sequencing. Variants sets produced were 77% concordant, with both techniques yielding similar numbers of discordant variants.

Conclusions

When comparing the design flexibility, performance, and cost of the targeted enrichment methods to whole-exome sequencing, the RNA-bait hybridization capture enrichment gene panel offers the better solution for interrogating the aortopathy genes in a clinical laboratory setting.  相似文献   

4.
Whole exome sequencing (WES) is increasingly used in research and diagnostics. WES users expect coverage of the entire coding region of known genes as well as sufficient read depth for the covered regions. It is, however, unknown which recent WES platform is most suitable to meet these expectations. We present insights into the performance of the most recent standard exome enrichment platforms from Agilent, NimbleGen and Illumina applied to six different DNA samples by two sequencing vendors per platform. Our results suggest that both Agilent and NimbleGen overall perform better than Illumina and that the high enrichment performance of Agilent is stable among samples and between vendors, whereas NimbleGen is only able to achieve vendor- and sample-specific best exome coverage. Moreover, the recent Agilent platform overall captures more coding exons with sufficient read depth than NimbleGen and Illumina. Due to considerable gaps in effective exome coverage, however, the three platforms cannot capture all known coding exons alone or in combination, requiring improvement. Our data emphasize the importance of evaluation of updated platform versions and suggest that enrichment-free whole genome sequencing can overcome the limitations of WES in sufficiently covering coding exons, especially GC-rich regions, and in characterizing structural variants.  相似文献   

5.

Background

High-throughput sequencing has opened up exciting possibilities in population and conservation genetics by enabling the assessment of genetic variation at genome-wide scales. One approach to reduce genome complexity, i.e. investigating only parts of the genome, is reduced-representation library (RRL) sequencing. Like similar approaches, RRL sequencing reduces ascertainment bias due to simultaneous discovery and genotyping of single-nucleotide polymorphisms (SNPs) and does not require reference genomes. Yet, generating such datasets remains challenging due to laboratory and bioinformatical issues. In the laboratory, current protocols require improvements with regards to sequencing homologous fragments to reduce the number of missing genotypes. From the bioinformatical perspective, the reliance of most studies on a single SNP caller disregards the possibility that different algorithms may produce disparate SNP datasets.

Results

We present an improved RRL (iRRL) protocol that maximizes the generation of homologous DNA sequences, thus achieving improved genotyping-by-sequencing efficiency. Our modifications facilitate generation of single-sample libraries, enabling individual genotype assignments instead of pooled-sample analysis. We sequenced ~1% of the orangutan genome with 41-fold median coverage in 31 wild-born individuals from two populations. SNPs and genotypes were called using three different algorithms. We obtained substantially different SNP datasets depending on the SNP caller. Genotype validations revealed that the Unified Genotyper of the Genome Analysis Toolkit and SAMtools performed significantly better than a caller from CLC Genomics Workbench (CLC). Of all conflicting genotype calls, CLC was only correct in 17% of the cases. Furthermore, conflicting genotypes between two algorithms showed a systematic bias in that one caller almost exclusively assigned heterozygotes, while the other one almost exclusively assigned homozygotes.

Conclusions

Our enhanced iRRL approach greatly facilitates genotyping-by-sequencing and thus direct estimates of allele frequencies. Our direct comparison of three commonly used SNP callers emphasizes the need to question the accuracy of SNP and genotype calling, as we obtained considerably different SNP datasets depending on caller algorithms, sequencing depths and filtering criteria. These differences affected scans for signatures of natural selection, but will also exert undue influences on demographic inferences. This study presents the first effort to generate a population genomic dataset for wild-born orangutans with known population provenance.  相似文献   

6.

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

The recent advancement in human genome sequencing and genotyping has revealed millions of single nucleotide polymorphisms (SNP) which determine the variation among human beings. One of the particular important projects is The International HapMap Project which provides the catalogue of human genetic variation for disease association studies. In this paper, we analyzed the genotype data in HapMap project by using National Institute of Environmental Health Sciences Environmental Genome Project (NIEHS EGP) SNPs. We first determine whether the HapMap data are transferable to the NIEHS data. Then, we study how well the HapMap SNPs capture the untyped SNPs in the region. Finally, we provide general guidelines for determining whether the SNPs chosen from HapMap may be able to capture most of the untyped SNPs.

Results

Our analysis shows that HapMap data are not robust enough to capture the untyped variants for most of the human genes. The performance of SNPs for European and Asian samples are marginal in capturing the untyped variants, i.e. approximately 55%. Expectedly, the SNPs from HapMap YRI panel can only capture approximately 30% of the variants. Although the overall performance is low, however, the SNPs for some genes perform very well and are able to capture most of the variants along the gene. This is observed in the European and Asian panel, but not in African panel. Through observation, we concluded that in order to have a well covered SNPs reference panel, the SNPs density and the association among reference SNPs are important to estimate the robustness of the chosen SNPs.

Conclusion

We have analyzed the coverage of HapMap SNPs using NIEHS EGP data. The results show that HapMap SNPs are transferable to the NIEHS SNPs. However, HapMap SNPs cannot capture some of the untyped SNPs and therefore resequencing may be needed to uncover more SNPs in the missing region.  相似文献   

8.

Background

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.

Results

Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %).

Conclusions

In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1796-6) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data.

Results

Here, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate.

Conclusion

We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.  相似文献   

10.

Key message

Imputing genotypes from the 90K SNP chip to exome sequence in wheat was moderately accurate. We investigated the factors that affect imputation and propose several strategies to improve accuracy.

Abstract

Imputing genetic marker genotypes from low to high density has been proposed as a cost-effective strategy to increase the power of downstream analyses (e.g. genome-wide association studies and genomic prediction) for a given budget. However, imputation is often imperfect and its accuracy depends on several factors. Here, we investigate the effects of reference population selection algorithms, marker density and imputation algorithms (Beagle4 and FImpute) on the accuracy of imputation from low SNP density (9K array) to the Infinium 90K single-nucleotide polymorphism (SNP) array for a collection of 837 hexaploid wheat Watkins landrace accessions. Based on these results, we then used the best performing reference selection and imputation algorithms to investigate imputation from 90K to exome sequence for a collection of 246 globally diverse wheat accessions. Accession-to-nearest-entry and genomic relationship-based methods were the best performing selection algorithms, and FImpute resulted in higher accuracy and was more efficient than Beagle4. The accuracy of imputing exome capture SNPs was comparable to imputing from 9 to 90K at approximately 0.71. This relatively low imputation accuracy is in part due to inconsistency between 90K and exome sequence formats. We also found the accuracy of imputation could be substantially improved to 0.82 when choosing an equivalent number of exome SNP, instead of 90K SNPs on the existing array, as the lower density set. We present a number of recommendations to increase the accuracy of exome imputation.
  相似文献   

11.
12.

Background

Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.

Results

Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.

Conclusions

Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies.  相似文献   

13.

Background

Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.

Results

We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.

Conclusions

From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-247) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.

Results

We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.

Conclusions

Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.  相似文献   

15.

Background

Recent studies suggest that epigenetic factors may play an important role in the pathogenesis of Parkinson’s disease (PD). In our previous work, we sequenced the exomes of sixteen patients from eight Chinese PD families using whole exome sequencing technology, consequently three patients from different pedigrees were found sharing the variant c.1460C?>?T (rs150689919) in the coding region of the Tet methyl cytosine dioxygenase 1 (TET1) gene.

Methods

In order to evaluate the possible association between sporadic PD and the single nucleotide polymorphism (SNP) rs150689919 in TET1, a case–control cohort study was conducted in 514 sporadic PD patients and 529 normal controls. Genotyping was determined by PCR and direct sequencing. Statistical significance was analyzed by the Chi-squared test.

Results

There was no statistical significance in TET1 rs150689919 genotype or allele frequencies between the PD cases and healthy controls, even after being stratified by gender and age at onset.

Conclusions

Our findings suggest that rs150689919 in TET1 may not be associated with PD in Chinese population. However, due to the limited data in this study, replication studies in larger sample and other populations are required.
  相似文献   

16.

Background

The domestic pig (Sus scrofa) is both an important livestock species and a model for biomedical research. Exome sequencing has accelerated identification of protein-coding variants underlying phenotypic traits in human and mouse. We aimed to develop and validate a similar resource for the pig.

Results

We developed probe sets to capture pig exonic sequences based upon the current Ensembl pig gene annotation supplemented with mapped expressed sequence tags (ESTs) and demonstrated proof-of-principle capture and sequencing of the pig exome in 96 pigs, encompassing 24 capture experiments. For most of the samples at least 10x sequence coverage was achieved for more than 90% of the target bases. Bioinformatic analysis of the data revealed over 236,000 high confidence predicted SNPs and over 28,000 predicted indels.

Conclusions

We have achieved coverage statistics similar to those seen with commercially available human and mouse exome kits. Exome capture in pigs provides a tool to identify coding region variation associated with production traits, including loss of function mutations which may explain embryonic and neonatal losses, and to improve genomic assemblies in the vicinity of protein coding genes in the pig.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-550) contains supplementary material, which is available to authorized users.  相似文献   

17.

Background

Structural rearrangements form a major class of somatic variation in cancer genomes. Local chromosome shattering, termed chromothripsis, is a mechanism proposed to be the cause of clustered chromosomal rearrangements and was recently described to occur in a small percentage of tumors. The significance of these clusters for tumor development or metastatic spread is largely unclear.

Results

We used genome-wide long mate-pair sequencing and SNP array profiling to reveal that chromothripsis is a widespread phenomenon in primary colorectal cancer and metastases. We find large and small chromothripsis events in nearly every colorectal tumor sample and show that several breakpoints of chromothripsis clusters and isolated rearrangements affect cancer genes, including NOTCH2, EXO1 and MLL3. We complemented the structural variation studies by sequencing the coding regions of a cancer exome in all colorectal tumor samples and found somatic mutations in 24 genes, including APC, KRAS, SMAD4 and PIK3CA. A pairwise comparison of somatic variations in primary and metastatic samples indicated that many chromothripsis clusters, isolated rearrangements and point mutations are exclusively present in either the primary tumor or the metastasis and may affect cancer genes in a lesion-specific manner.

Conclusions

We conclude that chromothripsis is a prevalent mechanism driving structural rearrangements in colorectal cancer and show that a complex interplay between point mutations, simple copy number changes and chromothripsis events drive colorectal tumor development and metastasis.  相似文献   

18.

Background

Whole exome sequencing (WES) is the state-of-the-art method for identification of pathogenic mutations in patients with a Mendelian disorder. WES comprehensively covers the coding sequence of the genome and is a fast and cost-effective technique.

Purpose

As most of the technical difficulties have been overcome for WES, the major issue is data processing and analysis to find the pathogenic sequence variation among tens of thousands of sequence changes. Bioinformatic analysis pipelines for filtering sequence variants have to be adapted according to the patients and family members examined by WES and the most likely inheritance pattern underlying the disease.

Possible approaches

Based on 4 cases, different variant prioritization strategies which led to identification of the most likely causative changes in the index patients are described.  相似文献   

19.
Song  Giltae  Hsu  Chih-Hao  Riemer  Cathy  Miller  Webb 《BMC bioinformatics》2011,12(1):1-7

Background

Several platforms for the analysis of genome-wide association data are available. However, these platforms focus on the evaluation of the genotype inherited by affected (i.e. case) individuals, whereas for some conditions (e.g. birth defects) the genotype of the mothers of affected individuals may also contribute to risk. For such conditions, it is critical to evaluate associations with both the maternal and the inherited (i.e. case) genotype. When genotype data are available for case-parent triads, a likelihood-based approach using log-linear modeling can be used to assess both the maternal and inherited genotypes. However, available software packages for log-linear analyses are not well suited to the analysis of typical genome-wide association data (e.g. including missing data).

Results

An integrated platform, Maternal and Inherited Analyses for Genome-wide Association Studies (MI-GWAS) for log-linear analyses of maternal and inherited genetic effects in large, genome-wide datasets, is described. MI-GWAS uses SAS and LEM software in combination to appropriately format data, perform the log-linear analyses and summarize the results. This platform was evaluated using existing genome-wide data and was shown to perform accurately and relatively efficiently.

Conclusions

The MI-GWAS platform provides a valuable tool for the analysis of association of a phenotype or condition with maternal and inherited genotypes using genome-wide data from case-parent triads. The source code for this platform is freely available at http://www.sph.uth.tmc.edu/sbrr/mi-gwas.htm.  相似文献   

20.
Targeted sequencing is a cost-efficient way to obtain answers to biological questions in many projects, but the choice of the enrichment method to use can be difficult. In this study we compared two hybridization methods for target enrichment for massively parallel sequencing and single nucleotide polymorphism (SNP) discovery, namely Nimblegen sequence capture arrays and the SureSelect liquid-based hybrid capture system. We prepared sequencing libraries from three HapMap samples using both methods, sequenced the libraries on the Illumina Genome Analyzer, mapped the sequencing reads back to the genome, and called variants in the sequences. 74-75% of the sequence reads originated from the targeted region in the SureSelect libraries and 41-67% in the Nimblegen libraries. We could sequence up to 99.9% and 99.5% of the regions targeted by capture probes from the SureSelect libraries and from the Nimblegen libraries, respectively. The Nimblegen probes covered 0.6 Mb more of the original 3.1 Mb target region than the SureSelect probes. In each sample, we called more SNPs and detected more novel SNPs from the libraries that were prepared using the Nimblegen method. Thus the Nimblegen method gave better results when judged by the number of SNPs called, but this came at the cost of more over-sampling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号