首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mapping‐by‐sequencing analyses have largely required a complete reference sequence and employed whole genome re‐sequencing. In species such as wheat, no finished genome reference sequence is available. Additionally, because of its large genome size (17 Gb), re‐sequencing at sufficient depth of coverage is not practical. Here, we extend the utility of mapping by sequencing, developing a bespoke pipeline and algorithm to map an early‐flowering locus in einkorn wheat (Triticum monococcum L.) that is closely related to the bread wheat genome A progenitor. We have developed a genomic enrichment approach using the gene‐rich regions of hexaploid bread wheat to design a 110‐Mbp NimbleGen SeqCap EZ in solution capture probe set, representing the majority of genes in wheat. Here, we use the capture probe set to enrich and sequence an F2 mapping population of the mutant. The mutant locus was identified in T. monococcum, which lacks a complete genome reference sequence, by mapping the enriched data set onto pseudo‐chromosomes derived from the capture probe target sequence, with a long‐range order of genes based on synteny of wheat with Brachypodium distachyon. Using this approach we are able to map the region and identify a set of deleted genes within the interval.  相似文献   

2.
Asan  Xu Y  Jiang H  Tyler-Smith C  Xue Y  Jiang T  Wang J  Wu M  Liu X  Tian G  Wang J  Wang J  Yang H  Zhang X 《Genome biology》2011,12(9):R95-12

Background

Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.

Results

We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.

Conclusions

We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.  相似文献   

3.
Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology’s Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences.  相似文献   

4.

Background

Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.

Results

We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.

Conclusions

Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons.  相似文献   

5.
Isolating high-priority segments of genomes greatly enhances the efficiency of next-generation sequencing (NGS) by allowing researchers to focus on their regions of interest. For the 2010–11 DNA Sequencing Research Group (DSRG) study, we compared outcomes from two leading companies, Agilent Technologies (Santa Clara, CA, USA) and Roche NimbleGen (Madison, WI, USA), which offer custom-targeted genomic enrichment methods. Both companies were provided with the same genomic sample and challenged to capture identical genomic locations for DNA NGS. The target region totaled 3.5 Mb and included 31 individual genes and a 2-Mb contiguous interval. Each company was asked to design its best assay, perform the capture in replicates, and return the captured material to the DSRG-participating laboratories. Sequencing was performed in two different laboratories on Genome Analyzer IIx systems (Illumina, San Diego, CA, USA). Sequencing data were analyzed for sensitivity, specificity, and coverage of the desired regions. The success of the enrichment was highly dependent on the design of the capture probes. Overall, coverage variability was higher for the Agilent samples. As variant discovery is the ultimate goal for a typical targeted sequencing project, we compared samples for their ability to sequence single-nucleotide polymorphisms (SNPs) as a test of the ability to capture both chromosomes from the sample. In the targeted regions, we detected 2546 SNPs with the NimbleGen samples and 2071 with Agilent''s. When limited to the regions that both companies included as baits, the number of SNPs was ∼1000 for each, with Agilent and NimbleGen finding a small number of unique SNPs not found by the other.  相似文献   

6.
Target‐capture approach has improved over the past years, proving to be very efficient tool for selectively sequencing genetic regions of interest. These methods have also allowed the use of noninvasive samples such as faeces (characterized by their low quantity and quality of endogenous DNA) to be used in conservation genomic, evolution and population genetic studies. Here we aim to test different protocols and strategies for exome capture using the Roche SeqCap EZ Developer kit (57.5 Mb). First, we captured a complex pool of DNA libraries. Second, we assessed the influence of using more than one faecal sample, extract and/or library from the same individual, to evaluate its effect on the molecular complexity of the experiment. We validated our experiments with 18 chimpanzee faecal samples collected from two field sites as a part of the Pan African Programme: The Cultured Chimpanzee. Those two field sites are in Kibale National Park, Uganda (N = 9) and Loango National Park, Gabon (N = 9). We demonstrate that at least 16 libraries can be pooled, target enriched through hybridization, and sequenced allowing for the genotyping of 951,949 exome markers for population genetic analyses. Further, we observe that molecule richness, and thus, data acquisition, increase when using multiple libraries from the same extract or multiple extracts from the same sample. Finally, repeated captures significantly decrease the proportion of off‐target reads from 34.15% after one capture round to 7.83% after two capture rounds, supporting our conclusion that two rounds of target enrichment are advisable when using complex faecal samples.  相似文献   

7.
The unprecedented increase in the throughput of DNA sequencing driven by next-generation technologies now allows efficient analysis of the complete protein-coding regions of genomes (exomes) for multiple samples in a single sequencing run. However, sample preparation and targeted enrichment of multiple samples has become a rate-limiting and costly step in high-throughput genetic analysis. Here we present an efficient protocol for parallel library preparation and targeted enrichment of pooled multiplexed bar-coded samples. The procedure is compatible with microarray-based and solution-based capture approaches. The high flexibility of this method allows multiplexing of 3-5 samples for whole-exome experiments, 20 samples for targeted footprints of 5 Mb and 96 samples for targeted footprints of 0.4 Mb. From library preparation to post-enrichment amplification, including hybridization time, the protocol takes 5-6 d for array-based enrichment and 3-4 d for solution-based enrichment. Our method provides a cost-effective approach for a broad range of applications, including targeted resequencing of large sample collections (e.g., follow-up genome-wide association studies), and whole-exome or custom mini-genome sequencing projects. This protocol gives details for a single-tube procedure, but scaling to a manual or automated 96-well plate format is possible and discussed.  相似文献   

8.
Although per-base sequencing costs have decreased during recent years, library preparation for targeted massively parallel sequencing remains constrained by high reagent cost, limited design flexibility, and protocol complexity. To address these limitations, we previously developed Hi-Plex, a polymerase chain reaction (PCR) massively parallel sequencing strategy for screening panels of genomic target regions. Here, we demonstrate that Hi-Plex applied with hybrid adapters can generate a library suitable for sequencing with both the Ion Torrent and the TruSeq chemistries and that adjusting primer concentrations improves coverage uniformity. These results expand Hi-Plex capabilities as an accurate, affordable, flexible, and rapid approach for various genetic screening applications.  相似文献   

9.
We discuss pooling methods of mutation detection for identifying rare mutations. We provide mathematical formulae for obtaining the optimal pool size as a function of the mutation frequency in the study population and the specificity of the test. The optimal pool size depends strongly on the specificity of the test. With a test that has 99% specificity, pooling can reduce the number of tests that need to be performed by 80%, whereas, with a test with 95% specificity, pooling reduces the number of samples that must be tested by only 50%. We used the software PHRED to call mutations after sequencing of pooled samples with known STK11 mutations. We found that, when the area under the curve for the less prominent peak was used to call mutations, we were able to pool pairs of samples and correctly identify mutations. Pooling of three samples did not lead to an adequately specific test for the basic automated allele-calling procedures that we used. We discuss methods by which the specificity may be improved to permit pooling of three or more samples when testing for mutations by sequencing.  相似文献   

10.
Next-generation sequencing of environmental samples can be challenging because of the variable DNA quantity and quality in these samples. High quality DNA libraries are needed for optimal results from next-generation sequencing. Environmental samples such as water may have low quality and quantities of DNA as well as contaminants that co-precipitate with DNA. The mechanical and enzymatic processes involved in extraction and library preparation may further damage the DNA. Gel size selection enables purification and recovery of DNA fragments of a defined size for sequencing applications. Nevertheless, this task is one of the most time-consuming steps in the DNA library preparation workflow. The protocol described here enables complete automation of agarose gel loading, electrophoretic analysis, and recovery of targeted DNA fragments. In this study, we describe a high-throughput approach to prepare high quality DNA libraries from freshwater samples that can be applied also to other environmental samples. We used an indirect approach to concentrate bacterial cells from environmental freshwater samples; DNA was extracted using a commercially available DNA extraction kit, and DNA libraries were prepared using a commercial transposon-based protocol. DNA fragments of 500 to 800 bp were gel size selected using Ranger Technology, an automated electrophoresis workstation. Sequencing of the size-selected DNA libraries demonstrated significant improvements to read length and quality of the sequencing reads.  相似文献   

11.
Whole exome sequencing (WES) is increasingly used in research and diagnostics. WES users expect coverage of the entire coding region of known genes as well as sufficient read depth for the covered regions. It is, however, unknown which recent WES platform is most suitable to meet these expectations. We present insights into the performance of the most recent standard exome enrichment platforms from Agilent, NimbleGen and Illumina applied to six different DNA samples by two sequencing vendors per platform. Our results suggest that both Agilent and NimbleGen overall perform better than Illumina and that the high enrichment performance of Agilent is stable among samples and between vendors, whereas NimbleGen is only able to achieve vendor- and sample-specific best exome coverage. Moreover, the recent Agilent platform overall captures more coding exons with sufficient read depth than NimbleGen and Illumina. Due to considerable gaps in effective exome coverage, however, the three platforms cannot capture all known coding exons alone or in combination, requiring improvement. Our data emphasize the importance of evaluation of updated platform versions and suggest that enrichment-free whole genome sequencing can overcome the limitations of WES in sufficiently covering coding exons, especially GC-rich regions, and in characterizing structural variants.  相似文献   

12.
Genomic sequencing of avian haemosporidian parasites (Haemosporida) has been challenging due to excessive contamination from host DNA. In this study, we developed a cost-effective protocol to obtain parasite sequences from naturally infected birds, based on targeted sequence capture and next generation sequencing. With the genomic data of Haemoproteus tartakovskyi as a reference, we successfully sequenced up to 1000 genes from each of the 15 selected samples belonging to nine different cytochrome b lineages, eight of which belong to Haemoproteus and one to Plasmodium. The targeted sequences were enriched to ~104-fold, and mixed infections were identified as well as the proportions of each mixed lineage. We found that the total number of reads and the proportions of exons sequenced decreased when the parasite lineage became more divergent from the reference genome. For each of the samples, the recovery of sequences from different exons varied with the function and GC content of the exon. From the obtained sequences, we detected within-lineage variation in both mitochondrial and nuclear genes, which may be a result of local adaptation to different host species and environmental conditions. This targeted sequence capture protocol can be applied to a broader range of species and will open a new door for further studies on disease diagnostics and comparative analysis of haemosporidians evolution.  相似文献   

13.
Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq) of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual). The validation was based on comparing single nucleotide polymorphism (SNP) frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS). Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14) and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual), which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05).  相似文献   

14.
Tumor specimens are often preserved as formalin-fixed paraffin-embedded (FFPE) tissue blocks, the most common clinical source for DNA sequencing. Herein, we evaluated the effect of pre-sequencing parameters to guide proper sample selection for targeted gene sequencing. Data from 113 FFPE lung tumor specimens were collected, and targeted gene sequencing was performed. Libraries were constructed using custom probes and were paired-end sequenced on a next generation sequencing platform. A PCR-based quality control (QC) assay was utilized to determine DNA quality, and a ratio was generated in comparison to control DNA. We observed that FFPE storage time, PCR/QC ratio, and DNA input in the library preparation were significantly correlated to most parameters of sequencing efficiency including depth of coverage, alignment rate, insert size, and read quality. A combined score using the three parameters was generated and proved highly accurate to predict sequencing metrics. We also showed wide read count variability within the genome, with worse coverage in regions of low GC content like in KRAS. Sample quality and GC content had independent effects on sequencing depth, and the worst results were observed in regions of low GC content in samples with poor quality. Our data confirm that FFPE samples are a reliable source for targeted gene sequencing in cancer, provided adequate sample quality controls are exercised. Tissue quality should be routinely assessed for pre-analytical factors, and sequencing depth may be limited in genomic regions of low GC content if suboptimal samples are utilized.  相似文献   

15.
Here we present an adaptation of NimbleGen 2.1M-probe array sequence capture for whole exome sequencing using the Illumina Genome Analyzer (GA) platform. The protocol involves two-stage library construction. The specificity of exome enrichment was approximately 80% with 95.6% even coverage of the 34 Mb target region at an average sequencing depth of 33-fold. Comparison of our results with whole genome shot-gun resequencing results showed that the exome SNP calls gave only 0.97% false positive and 6.27% false negative variants. Our protocol is also well suited for use with whole genome amplified DNA. The results presented here indicate that there is a promising future for large-scale population genomics and medical studies using a whole exome sequencing approach.  相似文献   

16.
Despite the ever-increasing throughput and steadily decreasing cost of next generation sequencing (NGS), whole genome sequencing of humans is still not a viable option for the majority of genetics laboratories. This is particularly true in the case of complex disease studies, where large sample sets are often required to achieve adequate statistical power. To fully leverage the potential of NGS technology on large sample sets, several methods have been developed to selectively enrich for regions of interest. Enrichment reduces both monetary and computational costs compared to whole genome sequencing, while allowing researchers to take advantage of NGS throughput. Several targeted enrichment approaches are currently available, including molecular inversion probe ligation sequencing (MIPS), oligonucleotide hybridization based approaches, and PCR-based strategies. To assess how these methods performed when used in conjunction with the ABI SOLID3+, we investigated three enrichment techniques: Nimblegen oligonucleotide hybridization array-based capture; Agilent SureSelect oligonucleotide hybridization solution-based capture; and Raindance Technologies' multiplexed PCR-based approach. Target regions were selected from exons and evolutionarily conserved areas throughout the human genome. Probe and primer pair design was carried out for all three methods using their respective informatics pipelines. In all, approximately 0.8 Mb of target space was identical for all 3 methods. SOLiD sequencing results were analyzed for several metrics, including consistency of coverage depth across samples, on-target versus off-target efficiency, allelic bias, and genotype concordance with array-based genotyping data. Agilent SureSelect exhibited superior on-target efficiency and correlation of read depths across samples. Nimblegen performance was similar at read depths at 20× and below. Both Raindance and Nimblegen SeqCap exhibited tighter distributions of read depth around the mean, but both suffered from lower on-target efficiency in our experiments. Raindance demonstrated the highest versatility in assay design.  相似文献   

17.
18.
Background. Wilson’s disease (WD) is a rare inherited disorder caused by mutations in the ATP7B gene resulting in copper accumulation in different organs. However, data on ATP7B mutation spectrum in Russia and worldwide are insufficient and contradictory. The objective of the present study was estimation of the frequency of ATP7B gene mutations in the Russian population of WD patients. Materials and methods. 75 WDpatients were examined by next-generation sequencing (NGS). A targeted panel NimbleGen SeqCap EZ Choice: 151012_HG38_CysFib_EZ_HX3 (ROCHE)was designed for analysis of ATP7B gene and possible modifier genes. Retrospective assessment of a diagnostic WD score (Leipzig, 2001) was also performed. Results. 31 mutations in ATP7B gene were detected. Two most frequent mutations were c.3207C > A (51,85% of alleles) and c.3190 G > A (8,64% of alleles). Single rare mutations were detected in 29% of cases. In 96% cases mutations of both copies of the ATP7B were revealed. We also observed 3 novel potentially pathogenic variants which were not previously described (c.1870-8A > G, c.3655A > T (p.Ile1219Phe), c.3036dupC (p.Lys1013fs). For 25% of patients at the time of the manifestation the diagnosis WD could not be established using the earlier proposed diagnostic score. There was a remarkable delay in diagnosis for the majority of patients. Only 33% of patients WD was diagnosed in three months after the first symptoms, 29%patients - in 3–12 months, 30% – in 1–10 years, in 8% – more than 10 years. Generally, clinical appearance of WD may be rather variable at manifestation and genetic profiling at this step is the only way to confirm the presence of WD.  相似文献   

19.

Background

Recent developments in deep (next-generation) sequencing technologies are significantly impacting medical research. The global analysis of protein coding regions in genomes of interest by whole exome sequencing is a widely used application. Many technologies for exome capture are commercially available; here we compare the performance of four of them: NimbleGen’s SeqCap EZ v3.0, Agilent’s SureSelect v4.0, Illumina’s TruSeq Exome, and Illumina’s Nextera Exome, all applied to the same human tumor DNA sample.

Results

Each capture technology was evaluated for its coverage of different exome databases, target coverage efficiency, GC bias, sensitivity in single nucleotide variant detection, sensitivity in small indel detection, and technical reproducibility. In general, all technologies performed well; however, our data demonstrated small, but consistent differences between the four capture technologies. Illumina technologies cover more bases in coding and untranslated regions. Furthermore, whereas most of the technologies provide reduced coverage in regions with low or high GC content, the Nextera technology tends to bias towards target regions with high GC content.

Conclusions

We show key differences in performance between the four technologies. Our data should help researchers who are planning exome sequencing to select appropriate exome capture technology for their particular application.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-449) contains supplementary material, which is available to authorized users.  相似文献   

20.
This study aimed to identify the underlying molecular genetic cause in four Spanish families clinically diagnosed of Retinitis Pigmentosa (RP), comprising one autosomal dominant RP (adRP), two autosomal recessive RP (arRP) and one with two possible modes of inheritance: arRP or X-Linked RP (XLRP). We performed whole exome sequencing (WES) using NimbleGen SeqCap EZ Exome V3 sample preparation kit and SOLID 5500xl platform. All variants passing filter criteria were validated by Sanger sequencing to confirm familial segregation and the absence in local control population. This strategy allowed the detection of: (i) one novel heterozygous splice-site deletion in RHO, c.937-2_944del, (ii) one rare homozygous mutation in C2orf71, c.1795T>C; p.Cys599Arg, not previously associated with the disease, (iii) two heterozygous null mutations in ABCA4, c.2041C>T; p.R681* and c.6088C>T; p.R2030*, and (iv) one mutation, c.2405-2406delAG; p.Glu802Glyfs*31 in the ORF15 of RPGR. The molecular findings for RHO and C2orf71 confirmed the initial diagnosis of adRP and arRP, respectively, while patients with the two ABCA4 mutations, both previously associated with Stargardt disease, presented symptoms of RP with early macular involvement. Finally, the X-Linked inheritance was confirmed for the family with the RPGR mutation. This latter finding allowed the inclusion of carrier sisters in our preimplantational genetic diagnosis program.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号