首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.

Results

Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.

Conclusions

Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies.  相似文献   

2.
To date we have little knowledge of how accurate next-generation sequencing (NGS) technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites) and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing) and short-read (Illumina) NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non-uniform sequence coverage, which, as our study suggests, may affect some types of tandem repeats more than others.  相似文献   

3.
DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large‐scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next‐generation sequencers, which are capable of producing millions of sequence reads simultaneously. Additionally, direct Sanger sequencing of DNA barcode amplicons, as practiced in most DNA barcoding procedures, is hampered by the need for relatively high‐target amplicon yield, coamplification of nuclear mitochondrial pseudogenes, confusion with sequences from intracellular endosymbiotic bacteria (e.g. Wolbachia) and instances of intraindividual variability (i.e. heteroplasmy). Any of these situations can lead to failed Sanger sequencing attempts or ambiguity of the generated DNA barcodes. Here, we demonstrate the potential application of next‐generation sequencing platforms for parallel acquisition of DNA barcode sequences from hundreds of specimens simultaneously. To facilitate retrieval of sequences obtained from individual specimens, we tag individual specimens during PCR amplification using unique 10‐mer oligonucleotides attached to DNA barcoding PCR primers. We employ 454 pyrosequencing to recover full‐length DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run (i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence reads for each individual specimen. The sequences produced are full‐length DNA barcodes for all but one of the included specimens. In a subset of samples, we also detected Wolbachia, nontarget species, and heteroplasmic sequences. Next‐generation sequencing is of great value because of its protocol simplicity, greatly reduced cost per barcode read, faster throughout and added information content.  相似文献   

4.
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.  相似文献   

5.
Characterization and population genetic analysis of multilocus genes, such as those found in the major histocompatibility complex (MHC) is challenging in nonmodel vertebrates. The traditional method of extensive cloning and Sanger sequencing is costly and time‐intensive and indirect methods of assessment often underestimate total variation. Here, we explored the suitability of 454 pyrosequencing for characterizing multilocus genes for use in population genetic studies. We compared two sample tagging protocols and two bioinformatic procedures for 454 sequencing through characterization of a 185‐bp fragment of MHC DRB exon 2 in wolverines (Gulo gulo) and further compared the results with those from cloning and Sanger sequencing. We found 10 putative DRB alleles in the 88 individuals screened with between two and four alleles per individual, suggesting amplification of a duplicated DRB gene. In addition to the putative alleles, all individuals possessed an easily identifiable pseudogene. In our system, sequence variants with a frequency below 6% in an individual sample were usually artefacts. However, we found that sample preparation and data processing procedures can greatly affect variant frequencies in addition to the complexity of the multilocus system. Therefore, we recommend determining a per‐amplicon‐variant frequency threshold for each unique system. The extremely deep coverage obtained in our study (approximately 5000×) coupled with the semi‐quantitative nature of pyrosequencing enabled us to assign all putative alleles to the two DRB loci, which is generally not possible using traditional methods. Our method of obtaining locus‐specific MHC genotypes will enhance population genetic analyses and studies on disease susceptibility in nonmodel wildlife species.  相似文献   

6.
High‐throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user‐friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable.  相似文献   

7.
DNA barcoding has become one of the most important techniques in plant species identification. Successful application of this technology is dependent on the availability of reference database of high species coverage. Unfortunately, there are experimental and data processing challenges to construct such a library within a short time. Here, we present our solutions to these challenges. We sequenced six conventional DNA barcode fragments (ITS1, ITS2, matK1, matK2, rbcL1, and rbcL2) of 380 flowering plants on next‐generation sequencing (NGS) platforms (Illumina Hiseq 2500 and Ion Torrent S5) and the Sanger sequencing platform. After comparing the sequencing depths, read lengths, base qualities, and base accuracies, we conclude that Illumina Hiseq2500 PE250 run is suitable for conventional DNA barcoding. We developed a new “Cotu” method to create consensus sequences from NGS reads for longer output sequences and more reliable bases than the other three methods. Step‐by‐step instructions to our method are provided. By using high‐throughput machines (PCR and NGS), labeling PCR, and the Cotu method, it is possible to significantly reduce the cost and labor investments for DNA barcoding. A regional or even global DNA barcoding reference library with high species coverage is likely to be constructed in a few years.  相似文献   

8.
High‐throughput sequencing methods for genotyping genome‐wide markers are being rapidly adopted for phylogenetics of nonmodel organisms in conservation and biodiversity studies. However, the reproducibility of SNP genotyping and degree of marker overlap or compatibility between datasets from different methodologies have not been tested in nonmodel systems. Using double‐digest restriction site‐associated DNA sequencing, we sequenced a common set of 22 specimens from the butterfly genus Speyeria on two different Illumina platforms, using two variations of library preparation. We then used a de novo approach to bioinformatic locus assembly and SNP discovery for subsequent phylogenetic analyses. We found a high rate of locus recovery despite differences in library preparation and sequencing platforms, as well as overall high levels of data compatibility after data processing and filtering. These results provide the first application of NGS methods for phylogenetic reconstruction in Speyeria and support the use and long‐term viability of SNP genotyping applications in nonmodel systems.  相似文献   

9.
The advent of next‐generation sequencing (NGS) technologies has transformed the way microsatellites are isolated for ecological and evolutionary investigations. Recent attempts to employ NGS for microsatellite discovery have used the 454, Illumina, and Ion Torrent platforms, but other methods including single‐molecule real‐time DNA sequencing (Pacific Biosciences or PacBio) remain viable alternatives. We outline a workflow from sequence quality control to microsatellite marker validation in three plant species using PacBio circular consensus sequencing (CCS). We then evaluate the performance of PacBio CCS in comparison with other NGS platforms for microsatellite isolation, through simulations that focus on variations in read length, read quantity and sequencing error rate. Although quality control of CCS reads reduced microsatellite yield by around 50%, hundreds of microsatellite loci that are expected to have improved conversion efficiency to functional markers were retrieved for each species. The simulations quantitatively validate the advantages of long reads and emphasize the detrimental effects of sequencing errors on NGS‐enabled microsatellite development. In view of the continuing improvement in read length on NGS platforms, sequence quality and the corresponding strategies of quality control will become the primary factors to consider for effective microsatellite isolation. Among current options, PacBio CCS may be optimal for rapid, small‐scale microsatellite development due to its flexibility in scaling sequencing effort, while platforms such as Illumina MiSeq will provide cost‐efficient solutions for multispecies microsatellite projects.  相似文献   

10.
11.
12.
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.  相似文献   

13.
The development of microsatellite loci has become more efficient using next‐generation sequencing (NGS) approaches, and many studies imply that the amount of applicable loci is large. However, few studies have sought to quantify the number of loci that are retained for use out of the thousands of sequence reads initially obtained. We analyzed the success rate of microsatellite loci development for three amphibian species using a 454 NGS approach on tetra‐nucleotide motif‐enriched species‐specific libraries. The number of sequence reads obtained differed strongly between species and ranged from 19,562 for Triturus cristatus to 55,626 for Lissotriton helveticus, with 52,075 reads obtained for Calotriton asper. PHOBOS was used to identify sequences with tetra‐nucleotide repeat motifs with a minimum repeat number of ten and high quality primer binding sites. Of 107 sequences for T. cristatus, 316 for C. asper and 319 for L. helveticus, we tested the amplification success, polymorphism, and degree of heterozygosity for 41 primer combinations each for C. asper and T. cristatus, and 22 for L. helveticus. We found 11 polymorphic loci for T. cristatus, 20 loci for C. asper, and 15 loci for L. helveticus. Extrapolated, the number of potentially amplifiable loci (PALs) resulted in estimated species‐specific success rates of 0.15% (T. cristatus), 0.30% (C. asper), and 0.39% (L. helveticus). Compared with representative Illumina NGS approaches, our applied 454‐sequencing approach on specifically enriched sublibraries proved to be quite competitive in terms of success rates and number of finally applicable loci.  相似文献   

14.
Microsatellite marker development has been greatly simplified by the use of high‐throughput sequencing followed by in silico microsatellite detection and primer design. However, the selection of markers designed by the existing pipelines depends either on arbitrary criteria, or older studies on PCR success. Based on wet laboratory experiments, we have identified the following factors that are most likely to influence genotyping success rate: alignment score between the primers and the amplicon; the distance between primers and microsatellites; the length of the PCR product; target region complexity and the number of reads underlying the sequence. The QDD pipeline has been modified to include these most pertinent factors in the output to help the selection of markers. Furthermore, new features are also included in the present version: (i) not only raw sequencing reads are accepted as input, but also contigs, allowing the analysis of assembled high‐coverage data; (ii) input data can be both in fasta and fastq format to facilitate the use of Illumina and IonTorrent reads; (iii) A comparison to known transposable elements allows their detection; (iv) A contamination check can be carried out by BLASTing potential markers against the nucleotide (nt) database of NCBI; (v) QDD3 is now also available imbedded into a virtual machine making installation easier and operating system independent. It can be used both on command‐line version as well as integrated into a Galaxy server, providing a user‐friendly interface, as well as the possibility to utilize a large variety of NGS tools.  相似文献   

15.
Next generation sequencing (NGS) has traditionally been performed in various fields including agricultural to clinical and there are so many sequencing platforms available in order to obtain accurate and consistent results. However, these platforms showed amplification bias when facilitating variant calls in personal genomes. Here, we sequenced whole genomes and whole exomes from ten Korean individuals using Illumina and Ion Proton, respectively to find the vulnerability and accuracy of NGS platform in the GC rich/poor area. Overall, a total of 1013 Gb reads from Illumina and ~39.1 Gb reads from Ion Proton were analyzed using BWA-GATK variant calling pipeline. Furthermore, conjunction with the VQSR tool and detailed filtering strategies, we achieved high-quality variants. Finally, each of the ten variants from Illumina only, Ion Proton only, and intersection was selected for Sanger validation. The validation results revealed that Illumina platform showed higher accuracy than Ion Proton. The described filtering methods are advantageous for large population-based whole genome studies designed to identify common and rare variations associated with complex diseases.  相似文献   

16.
The identification of mutations in targeted genes has been significantly simplified by the advent of TILLING (Targeting Induced Local Lesions In Genomes), speeding up the functional genomic analysis of animals and plants. Next‐generation sequencing (NGS) is gradually replacing classical TILLING for mutation detection, as it allows the analysis of a large number of amplicons in short durations. The NGS approach was used to identify mutations in a population of Solanum lycopersicum (tomato) that was doubly mutagenized by ethylmethane sulphonate (EMS). Twenty‐five genes belonging to carotenoids and folate metabolism were PCR‐amplified and screened to identify potentially beneficial alleles. To augment efficiency, the 600‐bp amplicons were directly sequenced in a non‐overlapping manner in Illumina MiSeq, obviating the need for a fragmentation step before library preparation. A comparison of the different pooling depths revealed that heterozygous mutations could be identified up to 128‐fold pooling. An evaluation of six different software programs (camba , crisp , gatk unified genotyper , lofreq , snver and vipr ) revealed that no software program was robust enough to predict mutations with high fidelity. Among these, crisp and camba predicted mutations with lower false discovery rates. The false positives were largely eliminated by considering only mutations commonly predicted by two different software programs. The screening of 23.47 Mb of tomato genome yielded 75 predicted mutations, 64 of which were confirmed by Sanger sequencing with an average mutation density of 1/367 Kb. Our results indicate that NGS combined with multiple variant detection tools can reduce false positives and significantly speed up the mutation discovery rate.  相似文献   

17.
The development and screening of microsatellite markers have been accelerated by next‐generation sequencing (NGS) technology and in particular GS‐FLX pyro‐sequencing (454). More recent platforms such as the PGM semiconductor sequencer (Ion Torrent) offer potential benefits such as dramatic reductions in cost, but to date have not been well utilized. Here, we critically compare the advantages and disadvantages of microsatellite development using PGM semiconductor sequencing and GS‐FLX pyro‐sequencing for two gymnosperm (a conifer and a cycad) and one angiosperm species. We show that these NGS platforms differ in the quantity of returned sequence data, unique microsatellite data and primer design opportunities, mostly consistent with the differences in read length. The strength of the PGM lies in the large amount of data generated at a comparatively lower cost and time. The strength of GS‐FLX lies in the return of longer average length sequences and therefore greater flexibility in producing markers with variable product length, due to longer flanking regions, which is ideal for capillary multiplexing. These differences need to be considered when choosing a NGS method for microsatellite discovery. However, the ongoing improvement in read lengths of the NGS platforms will reduce the disadvantage of the current short read lengths, particularly for the PGM platform, allowing greater flexibility in primer design coupled with the power of a larger number of sequences.  相似文献   

18.

Background  

Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previous estimates, and the discovery of an exciting 'rare biosphere' of molecular signatures ('species') of poorly understood ecological significance. We applied a high-throughput parallel tag sequencing (454 sequencing) protocol adopted for eukaryotes to investigate protistan community complexity in two contrasting anoxic marine ecosystems (Framvaren Fjord, Norway; Cariaco deep-sea basin, Venezuela). Both sampling sites have previously been scrutinized for protistan diversity by traditional clone library construction and Sanger sequencing. By comparing these clone library data with 454 amplicon library data, we assess the efficiency of high-throughput tag sequencing strategies. We here present a novel, highly conservative bioinformatic analysis pipeline for the processing of large tag sequence data sets.  相似文献   

19.
HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号