首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.

Results

In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.

Conclusion

Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-264) contains supplementary material, which is available to authorized users.  相似文献   

2.

Background

PCR amplicon sequencing has been widely used as a targeted approach for both DNA and RNA sequence analysis. High multiplex PCR has further enabled the enrichment of hundreds of amplicons in one simple reaction. At the same time, the performance of PCR amplicon sequencing can be negatively affected by issues such as high duplicate reads, polymerase artifacts and PCR amplification bias. Recently researchers have made some good progress in addressing these shortcomings by incorporating molecular barcodes into PCR primer design. So far, most work has been demonstrated using one to a few pairs of primers, which limits the size of the region one can analyze.

Results

We developed a simple protocol, which enables the use of molecular barcodes in high multiplex PCR with hundreds of amplicons. Using this protocol and reference materials, we demonstrated the applications in accurate variant calling at very low fraction over a large region and in targeted RNA quantification. We also evaluated the protocol’s utility in profiling FFPE samples.

Conclusions

We demonstrated the successful implementation of molecular barcodes in high multiplex PCR, with multiplex scale many times higher than earlier work. We showed that the new protocol combines the benefits of both high multiplex PCR and molecular barcodes, i.e. the analysis of a very large region, low DNA input requirement, very good reproducibility and the ability to detect as low as 1 % mutations with minimal false positives (FP).

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1806-8) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background

Dinoflagellates are an ecologically important group of protists with important functions as primary producers, coral symbionts and in toxic red tides. Although widely studied, the natural diversity of dinoflagellates is not well known. DNA barcoding has been utilized successfully for many protist groups. We used this approach to systematically sample known “species”, as a reference to measure the natural diversity in three marine environments.

Methodology/Principal Findings

In this study, we assembled a large cytochrome c oxidase 1 (COI) barcode database from 8 public algal culture collections plus 3 private collections worldwide resulting in 336 individual barcodes linked to specific cultures. We demonstrate that COI can identify to the species level in 15 dinoflagellate genera, generally in agreement with existing species names. Exceptions were found in species belonging to genera that were generally already known to be taxonomically challenging, such as Alexandrium or Symbiodinium. Using this barcode database as a baseline for cultured dinoflagellate diversity, we investigated the natural diversity in three diverse marine environments (Northeast Pacific, Northwest Atlantic, and Caribbean), including an evaluation of single-cell barcoding to identify uncultivated groups. From all three environments, the great majority of barcodes were not represented by any known cultured dinoflagellate, and we also observed an explosion in the diversity of genera that previously contained a modest number of known species, belonging to Kareniaceae. In total, 91.5% of non-identical environmental barcodes represent distinct species, but only 51 out of 603 unique environmental barcodes could be linked to cultured species using a conservative cut-off based on distances between cultured species.

Conclusions/Significance

COI barcoding was successful in identifying species from 70% of cultured genera. When applied to environmental samples, it revealed a massive amount of natural diversity in dinoflagellates. This highlights the extent to which we underestimate microbial diversity in the environment.  相似文献   

4.

Background

A minor but significant fraction of samples subjected to next-generation sequencing methods are either mixed-up or cross-contaminated. These events can lead to false or inconclusive results. We have therefore developed SASI-Seq; a process whereby a set of uniquely barcoded DNA fragments are added to samples destined for sequencing. From the final sequencing data, one can verify that all the reads derive from the original sample(s) and not from contaminants or other samples.

Results

By adding a mixture of three uniquely barcoded amplicons, of different sizes spanning the range of insert sizes one would normally use for Illumina sequencing, at a spike-in level of approximately 0.1%, we demonstrate that these fragments remain intimately associated with the sample. They can be detected following even the tightest size selection regimes or exome enrichment and can report the occurrence of sample mix-ups and cross-contamination.As a consequence of this work, we have designed a set of 384 eleven-base Illumina barcode sequences that are at least 5 changes apart from each other, allowing for single-error correction and very low levels of barcode misallocation due to sequencing error.

Conclusion

SASI-Seq is a simple, inexpensive and flexible tool that enables sample assurance, allows deconvolution of sample mix-ups and reports levels of cross-contamination between samples throughout NGS workflows.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-110) contains supplementary material, which is available to authorized users.  相似文献   

5.
Wilson JJ 《PloS one》2011,6(9):e24769

Background

A common perception is that DNA barcode datamatrices have limited phylogenetic signal due to the small number of characters available per taxon. However, another school of thought suggests that the massively increased taxon sampling afforded through the use of DNA barcodes may considerably increase the phylogenetic signal present in a datamatrix. Here I test this hypothesis using a large dataset of macrolepidopteran DNA barcodes.

Methodology/Principal Findings

Taxon sampling was systematically increased in datamatrices containing macrolepidopteran DNA barcodes. Sixteen family groups were designated as concordance groups and two quantitative measures; the taxon consistency index and the taxon retention index, were used to assess any changes in phylogenetic signal as a result of the increase in taxon sampling. DNA barcodes alone, even with maximal taxon sampling (500 species per family), were not sufficient to reconstruct monophyly of families and increased taxon sampling generally increased the number of clades formed per family. However, the scores indicated a similar level of taxon retention (species from a family clustering together) in the cladograms as the number of species included in the datamatrix was increased, suggesting substantial phylogenetic signal below the ‘family’ branch.

Conclusions/Significance

The development of supermatrix, supertree or constrained tree approaches could enable the exploitation of the massive taxon sampling afforded through DNA barcodes for phylogenetics, connecting the twigs resolved by barcodes to the deep branches resolved through phylogenomics.  相似文献   

6.

Background

The increasing availability of reference libraries of DNA barcodes (RLDB) offers the opportunity to the screen the level of consistency in DNA barcode data among libraries, in order to detect possible disagreements generated from taxonomic uncertainty or operational shortcomings. We propose a ranking system to attribute a confidence level to species identifications associated with DNA barcode records from a RLDB. Here we apply the proposed ranking system to a newly generated RLDB for marine fish of Portugal.

Methodology/Principal Findings

Specimens (n = 659) representing 102 marine fish species were collected along the continental shelf of Portugal, morphologically identified and archived in a museum collection. Samples were sequenced at the barcode region of the cytochrome oxidase subunit I gene (COI-5P). Resultant DNA barcodes had average intra-specific and inter-specific Kimura-2-parameter distances (0.32% and 8.84%, respectively) within the range usually observed for marine fishes. All specimens were ranked in five different levels (A–E), according to the reliability of the match between their species identification and the respective diagnostic DNA barcodes. Grades A to E were attributed upon submission of individual specimen sequences to BOLD-IDS and inspection of the clustering pattern in the NJ tree generated. Overall, our study resulted in 73.5% of unambiguous species IDs (grade A), 7.8% taxonomically congruent barcode clusters within our dataset, but awaiting external confirmation (grade B), and 18.7% of species identifications with lower levels of reliability (grades C/E).

Conclusion/Significance

We highlight the importance of implementing a system to rank barcode records in RLDB, in order to flag taxa in need of taxonomic revision, or reduce ambiguities of discordant data. With increasing DNA barcode records publicly available, this cross-validation system would provide a metric of relative accuracy of barcodes, while enabling the continuous revision and annotation required in taxonomic work.  相似文献   

7.

Background

Poorly regulated international trade in ornamental fishes poses risks to both biodiversity and economic activity via invasive alien species and exotic pathogens. Border security officials need robust tools to confirm identifications, often requiring hard-to-obtain taxonomic literature and expertise. DNA barcoding offers a potentially attractive tool for quarantine inspection, but has yet to be scrutinised for aquarium fishes. Here, we present a barcoding approach for ornamental cyprinid fishes by: (1) expanding current barcode reference libraries; (2) assessing barcode congruence with morphological identifications under numerous scenarios (e.g. inclusion of GenBank data, presence of singleton species, choice of analytical method); and (3) providing supplementary information to identify difficult species.

Methodology/Principal Findings

We sampled 172 ornamental cyprinid fish species from the international trade, and provide data for 91 species currently unrepresented in reference libraries (GenBank/Bold). DNA barcodes were found to be highly congruent with our morphological assignments, achieving success rates of 90–99%, depending on the method used (neighbour-joining monophyly, bootstrap, nearest neighbour, GMYC, percent threshold). Inclusion of data from GenBank (additional 157 spp.) resulted in a more comprehensive library, but at a cost to success rate due to the increased number of singleton species. In addition to DNA barcodes, our study also provides supporting data in the form of specimen images, morphological characters, taxonomic bibliography, preserved vouchers, and nuclear rhodopsin sequences. Using this nuclear rhodopsin data we also uncovered evidence of interspecific hybridisation, and highlighted unrecognised diversity within popular aquarium species, including the endangered Indian barb Puntius denisonii.

Conclusions/Significance

We demonstrate that DNA barcoding provides a highly effective biosecurity tool for rapidly identifying ornamental fishes. In cases where DNA barcodes are unable to offer an identification, we improve on previous studies by consolidating supplementary information from multiple data sources, and empower biosecurity agencies to confidently identify high-risk fishes in the aquarium trade.  相似文献   

8.

Background

The State of Bavaria is involved in a research program that will lead to the construction of a DNA barcode library for all animal species within its territorial boundaries. The present study provides a comprehensive DNA barcode library for the Geometridae, one of the most diverse of insect families.

Methodology/Principal Findings

This study reports DNA barcodes for 400 Bavarian geometrid species, 98 per cent of the known fauna, and approximately one per cent of all Bavarian animal species. Although 98.5% of these species possess diagnostic barcode sequences in Bavaria, records from neighbouring countries suggest that species-level resolution may be compromised in up to 3.5% of cases. All taxa which apparently share barcodes are discussed in detail. One case of modest divergence (1.4%) revealed a species overlooked by the current taxonomic system: Eupithecia goossensiata Mabille, 1869 stat.n. is raised from synonymy with Eupithecia absinthiata (Clerck, 1759) to species rank. Deep intraspecific sequence divergences (>2%) were detected in 20 traditionally recognized species.

Conclusions/Significance

The study emphasizes the effectiveness of DNA barcoding as a tool for monitoring biodiversity. Open access is provided to a data set that includes records for 1,395 geometrid specimens (331 species) from Bavaria, with 69 additional species from neighbouring regions. Taxa with deep intraspecific sequence divergences are undergoing more detailed analysis to ascertain if they represent cases of cryptic diversity.  相似文献   

9.

Background

Southeast Asia is recognized as a region of very high biodiversity, much of which is currently at risk due to habitat loss and other threats. However, many aspects of this diversity, even for relatively well-known groups such as mammals, are poorly known, limiting ability to develop conservation plans. This study examines the value of DNA barcodes, sequences of the mitochondrial COI gene, to enhance understanding of mammalian diversity in the region and hence to aid conservation planning.

Methodology and Principal Findings

DNA barcodes were obtained from nearly 1900 specimens representing 165 recognized species of bats. All morphologically or acoustically distinct species, based on classical taxonomy, could be discriminated with DNA barcodes except four closely allied species pairs. Many currently recognized species contained multiple barcode lineages, often with deep divergence suggesting unrecognized species. In addition, most widespread species showed substantial genetic differentiation across their distributions. Our results suggest that mammal species richness within the region may be underestimated by at least 50%, and there are higher levels of endemism and greater intra-specific population structure than previously recognized.

Conclusions

DNA barcodes can aid conservation and research by assisting field workers in identifying species, by helping taxonomists determine species groups needing more detailed analysis, and by facilitating the recognition of the appropriate units and scales for conservation planning.  相似文献   

10.

Background

Populus is an ecologically and economically important genus of trees, but distinguishing between wild species is relatively difficult due to extensive interspecific hybridization and introgression, and the high level of intraspecific morphological variation. The DNA barcoding approach is a potential solution to this problem.

Methodology/Principal Findings

Here, we tested the discrimination power of five chloroplast barcodes and one nuclear barcode (ITS) among 95 trees that represent 21 Populus species from western China. Among all single barcode candidates, the discrimination power is highest for the nuclear ITS, progressively lower for chloroplast barcodes matK (M), trnG-psbK (G) and psbK-psbI (P), and trnH-psbA (H) and rbcL (R); the discrimination efficiency of the nuclear ITS (I) is also higher than any two-, three-, or even the five-locus combination of chloroplast barcodes. Among the five combinations of a single chloroplast barcode plus the nuclear ITS, H+I and P+I differentiated the highest and lowest portion of species, respectively. The highest discrimination rate for the barcodes or barcode combinations examined here is 55.0% (H+I), and usually discrimination failures occurred among species from sympatric or parapatric areas.

Conclusions/Significance

In this case study, we showed that when discriminating Populus species from western China, the nuclear ITS region represents a more promising barcode than any maternally inherited chloroplast region or combination of chloroplast regions. Meanwhile, combining the ITS region with chloroplast regions may improve the barcoding success rate and assist in detecting recent interspecific hybridizations. Failure to discriminate among several groups of Populus species from sympatric or parapatric areas may have been the result of incomplete lineage sorting, frequent interspecific hybridizations and introgressions. We agree with a previous proposal for constructing a tiered barcoding system in plants, especially for taxonomic groups that have complex evolutionary histories (e.g. Populus).  相似文献   

11.
Park DS  Foottit R  Maw E  Hebert PD 《PloS one》2011,6(4):e18749

Background

DNA barcoding, the analysis of sequence variation in the 5′ region of the mitochondrial cytochrome c oxidase I (COI) gene, has been shown to provide an efficient method for the identification of species in a wide range of animal taxa. In order to assess the effectiveness of barcodes in the discrimination of Heteroptera, we examined 344 species belonging to 178 genera, drawn from specimens in the Canadian National Collection of Insects.

Methodology/Principal Findings

Analysis of the COI gene revealed less than 2% intra-specific divergence in 90% of the taxa examined, while minimum interspecific distances exceeded 3% in 77% of congeneric species pairs. Instances where barcodes fail to distinguish species represented clusters of morphologically similar species, except one case of barcode identity between species in different genera. Several instances of deep intraspecific divergence were detected suggesting possible cryptic species.

Conclusions/Significance

Although this analysis encompasses 0.8% of the described global fauna, our results indicate that DNA barcodes will aid the identification of Heteroptera. This advance will be useful in pest management, regulatory and environmental applications and will also reveal species that require further taxonomic research.  相似文献   

12.

Background

In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of these objectives.

Results

We have developed spiked genotyping-by-sequencing (sGBS), which combines targeted amplicon sequencing with reduced representation genotyping-by-sequencing. To minimize the cost of targeted assays, we utilize a small percent of sequencing capacity available in runs of GBS libraries to “spike” amplified targets of a priori alleles tagged with a different set of unique barcodes. This open platform allows multiple, single-target loci to be assayed while simultaneously generating a whole-genome profile. This dual-genotyping approach allows different sets of samples to be evaluated for single markers or whole genome-profiling. Here, we report the application of sGBS on a winter wheat panel that was screened for converted KASP markers and newly-designed markers targeting known polymorphisms in the leaf rust resistance gene Lr34.

Conclusions

The flexibility and low-cost of sGBS will enable a range of applications across genetics research. Specifically in breeding applications, the sGBS approach will allow breeders to obtain a whole-genome profile of important individuals while simultaneously targeting specific genes for a range of selection strategies across the breeding program.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1404-9) contains supplementary material, which is available to authorized users.  相似文献   

13.

Background

Widespread uptake of DNA barcoding technology for vascular plants has been slow due to the relatively poor resolution of species discrimination (∼70%) and low sequencing and amplification success of one of the two official barcoding loci, matK. Studies to date have mostly focused on finding a solution to these intrinsic limitations of the markers, rather than posing questions that can maximize the utility of DNA barcodes for plants with the current technology.

Methodology/Principal Findings

Here we test the ability of plant DNA barcodes using the two official barcoding loci, rbcLa and matK, plus an alternative barcoding locus, trnH-psbA, to estimate the species diversity of trees in a tropical rainforest plot. Species discrimination accuracy was similar to findings from previous studies but species richness estimation accuracy proved higher, up to 89%. All combinations which included the trnH-psbA locus performed better at both species discrimination and richness estimation than matK, which showed little enhanced species discriminatory power when concatenated with rbcLa. The utility of the trnH-psbA locus is limited however, by the occurrence of intraspecific variation observed in some angiosperm families to occur as an inversion that obscures the monophyly of species.

Conclusions/Significance

We demonstrate for the first time, using a case study, the potential of plant DNA barcodes for the rapid estimation of species richness in taxonomically poorly known areas or cryptic populations revealing a powerful new tool for rapid biodiversity assessment. The combination of the rbcLa and trnH-psbA loci performed better for this purpose than any two-locus combination that included matK. We show that although DNA barcodes fail to discriminate all species of plants, new perspectives and methods on biodiversity value and quantification may overshadow some of these shortcomings by applying barcode data in new ways.  相似文献   

14.

Background

Species number, functional traits, and phylogenetic history all contribute to characterizing the biological diversity in plant communities. The phylogenetic component of diversity has been particularly difficult to quantify in species-rich tropical tree assemblages. The compilation of previously published (and often incomplete) data on evolutionary relationships of species into a composite phylogeny of the taxa in a forest, through such programs as Phylomatic, has proven useful in building community phylogenies although often of limited resolution. Recently, DNA barcodes have been used to construct a robust community phylogeny for nearly 300 tree species in a forest dynamics plot in Panama using a supermatrix method. In that study sequence data from three barcode loci were used to generate a well-resolved species-level phylogeny.

Methodology/Principal Findings

Here we expand upon this earlier investigation and present results on the use of a phylogenetic constraint tree to generate a community phylogeny for a diverse, tropical forest dynamics plot in Puerto Rico. This enhanced method of phylogenetic reconstruction insures the congruence of the barcode phylogeny with broadly accepted hypotheses on the phylogeny of flowering plants (i.e., APG III) regardless of the number and taxonomic breadth of the taxa sampled. We also compare maximum parsimony versus maximum likelihood estimates of community phylogenetic relationships as well as evaluate the effectiveness of one- versus two- versus three-gene barcodes in resolving community evolutionary history.

Conclusions/Significance

As first demonstrated in the Panamanian forest dynamics plot, the results for the Puerto Rican plot illustrate that highly resolved phylogenies derived from DNA barcode sequence data combined with a constraint tree based on APG III are particularly useful in comparative analysis of phylogenetic diversity and will enhance research on the interface between community ecology and evolution.  相似文献   

15.

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users.  相似文献   

16.

Background

Next-Generation Sequencing (NGS) technologies have rapidly advanced our understanding of human variation in cancer. To accurately translate the raw sequencing data into practical knowledge, annotation tools, algorithms and pipelines must be developed that keep pace with the rapidly evolving technology. Currently, a challenge exists in accurately annotating multi-nucleotide variants (MNVs). These tandem substitutions, when affecting multiple nucleotides within a single protein codon of a gene, result in a translated amino acid involving all nucleotides in that codon. Most existing variant callers report a MNV as individual single-nucleotide variants (SNVs), often resulting in multiple triplet codon sequences and incorrect amino acid predictions. To correct potentially misannotated MNVs among reported SNVs, a primary challenge resides in haplotype phasing which is to determine whether the neighboring SNVs are co-located on the same chromosome.

Results

Here we describe MAC (Multi-Nucleotide Variant Annotation Corrector), an integrative pipeline developed to correct potentially mis-annotated MNVs. MAC was designed as an application that only requires a SNV file and the matching BAM file as data inputs. Using an example data set containing 3024 SNVs and the corresponding whole-genome sequencing BAM files, we show that MAC identified eight potentially mis-annotated SNVs, and accurately updated the amino acid predictions for seven of the variant calls.

Conclusions

MAC can identify and correct amino acid predictions that result from MNVs affecting multiple nucleotides within a single protein codon, which cannot be handled by most existing SNV-based variant pipelines. The MAC software is freely available and represents a useful tool for the accurate translation of genomic sequence to protein function.  相似文献   

17.

Background

The plant working group of the Consortium for the Barcode of Life recommended the two-locus combination of rbcL + matK as the plant barcode, yet the combination was shown to successfully discriminate among 907 samples from 550 species at the species level with a probability of 72%. The group admits that the two-locus barcode is far from perfect due to the low identification rate, and the search is not over.

Methodology/Principal Findings

Here, we compared seven candidate DNA barcodes (psbA-trnH, matK, rbcL, rpoC1, ycf5, ITS2, and ITS) from medicinal plant species. Our ranking criteria included PCR amplification efficiency, differential intra- and inter-specific divergences, and the DNA barcoding gap. Our data suggest that the second internal transcribed spacer (ITS2) of nuclear ribosomal DNA represents the most suitable region for DNA barcoding applications. Furthermore, we tested the discrimination ability of ITS2 in more than 6600 plant samples belonging to 4800 species from 753 distinct genera and found that the rate of successful identification with the ITS2 was 92.7% at the species level.

Conclusions

The ITS2 region can be potentially used as a standard DNA barcode to identify medicinal plants and their closely related species. We also propose that ITS2 can serve as a novel universal barcode for the identification of a broader range of plant taxa.  相似文献   

18.

Background

Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls.

Results

We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced.

Conclusions

Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1073) contains supplementary material, which is available to authorized users.  相似文献   

19.
Cook MA  Chan CK  Jorgensen P  Ketela T  So D  Tyers M  Ho CY 《PloS one》2008,3(2):e1546

Background

Molecular barcode arrays provide a powerful means to analyze cellular phenotypes in parallel through detection of short (20–60 base) unique sequence tags, or “barcodes”, associated with each strain or clone in a collection. However, costs of current methods for microarray construction, whether by in situ oligonucleotide synthesis or ex situ coupling of modified oligonucleotides to the slide surface are often prohibitive to large-scale analyses.

Methodology/Principal Findings

Here we demonstrate that unmodified 20mer oligonucleotide probes printed on conventional surfaces show comparable hybridization signals to covalently linked 5′-amino-modified probes. As a test case, we undertook systematic cell size analysis of the budding yeast Saccharomyces cerevisiae genome-wide deletion collection by size separation of the deletion pool followed by determination of strain abundance in size fractions by barcode arrays. We demonstrate that the properties of a 13K unique feature spotted 20 mer oligonucleotide barcode microarray compare favorably with an analogous covalently-linked oligonucleotide array. Further, cell size profiles obtained with the size selection/barcode array approach recapitulate previous cell size measurements of individual deletion strains. Finally, through atomic force microscopy (AFM), we characterize the mechanism of hybridization to unmodified barcode probes on the slide surface.

Conclusions/Significance

These studies push the lower limit of probe size in genome-scale unmodified oligonucleotide microarray construction and demonstrate a versatile, cost-effective and reliable method for molecular barcode analysis.  相似文献   

20.

Background

Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics.

Results

We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved.

Conclusion

The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号