首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.

Results

In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.

Conclusion

Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-264) contains supplementary material, which is available to authorized users.  相似文献   

2.
For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6.  相似文献   

3.
DNA‐assisted proteomics technologies enable ultra‐sensitive measurements in multiplex format using DNA‐barcoded affinity reagents. Although numerous antibodies are available, nowadays targeting nearly the complete human proteome, the majority is not accessible at the quantity, concentration, or purity recommended for most bio‐conjugation protocols. Here, we introduce a magnetic bead‐assisted DNA‐barcoding approach, applicable for several antibodies in parallel, as well as reducing required reagents quantities up to a thousand‐fold. The success of DNA‐barcoding and retained functionality of antibodies were demonstrated in sandwich immunoassays and standard quantitative Immuno‐PCR assays. Specific DNA‐barcoding of antibodies for multiplex applications was presented on suspension bead arrays with read‐out on a massively parallel sequencing platform in a procedure denoted Immuno‐Sequencing. Conclusively, human plasma samples were analyzed to indicate the functionality of barcoded antibodies in intended proteomics applications.  相似文献   

4.
Here we demonstrate a method for unbiased multiplexed deep sequencing of RNA and DNA libraries using a novel, efficient and adaptable barcoding strategy called Post Amplification Ligation-Mediated (PALM). PALM barcoding is performed as the very last step of library preparation, eliminating a potential barcode-induced bias and allowing the flexibility to synthesize as many barcodes as needed. We sequenced PALM barcoded micro RNA (miRNA) and DNA reference samples and evaluated the quantitative barcode-induced bias in comparison to the same reference samples prepared using the Illumina TruSeq barcoding strategy. The Illumina TruSeq small RNA strategy introduces the barcode during the PCR step using differentially barcoded primers, while the TruSeq DNA strategy introduces the barcode before the PCR step by ligation of differentially barcoded adaptors. Results show virtually no bias between the differentially barcoded miRNA and DNA samples, both for the PALM and the TruSeq sample preparation methods. We also multiplexed miRNA reference samples using a pre-PCR barcode ligation. This barcoding strategy results in significant bias.  相似文献   

5.
Multiplexed high-throughput pyrosequencing is currently limited in complexity (number of samples sequenced in parallel), and in capacity (number of sequences obtained per sample). Physical-space segregation of the sequencing platform into a fixed number of channels allows limited multiplexing, but obscures available sequencing space. To overcome these limitations, we have devised a novel barcoding approach to allow for pooling and sequencing of DNA from independent samples, and to facilitate subsequent segregation of sequencing capacity. Forty-eight forward–reverse barcode pairs are described: each forward and each reverse barcode unique with respect to at least 4 nt positions. With improved read lengths of pyrosequencers, combinations of forward and reverse barcodes may be used to sequence from as many as n2 independent libraries for each set of ‘n’ forward and ‘n’ reverse barcodes, for each defined set of cloning-linkers. In two pilot series of barcoded sequencing using the GS20 Sequencer (454/Roche), we found that over 99.8% of obtained sequences could be assigned to 25 independent, uniquely barcoded libraries based on the presence of either a perfect forward or a perfect reverse barcode. The false-discovery rate, as measured by the percentage of sequences with unexpected perfect pairings of unmatched forward and reverse barcodes, was estimated to be <0.005%.  相似文献   

6.

Background

PCR amplicon sequencing has been widely used as a targeted approach for both DNA and RNA sequence analysis. High multiplex PCR has further enabled the enrichment of hundreds of amplicons in one simple reaction. At the same time, the performance of PCR amplicon sequencing can be negatively affected by issues such as high duplicate reads, polymerase artifacts and PCR amplification bias. Recently researchers have made some good progress in addressing these shortcomings by incorporating molecular barcodes into PCR primer design. So far, most work has been demonstrated using one to a few pairs of primers, which limits the size of the region one can analyze.

Results

We developed a simple protocol, which enables the use of molecular barcodes in high multiplex PCR with hundreds of amplicons. Using this protocol and reference materials, we demonstrated the applications in accurate variant calling at very low fraction over a large region and in targeted RNA quantification. We also evaluated the protocol’s utility in profiling FFPE samples.

Conclusions

We demonstrated the successful implementation of molecular barcodes in high multiplex PCR, with multiplex scale many times higher than earlier work. We showed that the new protocol combines the benefits of both high multiplex PCR and molecular barcodes, i.e. the analysis of a very large region, low DNA input requirement, very good reproducibility and the ability to detect as low as 1 % mutations with minimal false positives (FP).

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1806-8) contains supplementary material, which is available to authorized users.  相似文献   

7.
We consider the design and evaluation of short barcodes, with a length between six and eight nucleotides, used for parallel sequencing on platforms where substitution errors dominate. Such codes should have not only good error correction properties but also the code words should fulfil certain biological constraints (experimental parameters). We compare published barcodes with codes obtained by two new constructions methods, one based on the currently best known linear codes and a simple randomized construction method. The evaluation done is with respect to the error correction capabilities, barcode size and their experimental parameters and fundamental bounds on the code size and their distance properties. We provide a list of codes for lengths between six and eight nucleotides, where for length eight, two substitution errors can be corrected. In fact, no code with larger minimum distance can exist.  相似文献   

8.
Biotechnological and biomolecular advances have introduced novel uses for DNA such as DNA computing, storage, and encryption. For these applications, DNA sequence design requires maximal desired (and minimal undesired) hybridizations, which are the product of a single new DNA strand from 2 single DNA strands. Here, we propose a novel constraint to design DNA sequences based on thermodynamic properties. Existing constraints for DNA design are based on the Hamming distance, a constraint that does not address the thermodynamic properties of the DNA sequence. Using a unique, improved genetic algorithm, we designed DNA sequence sets which satisfy different distance constraints and employ a free energy gap based on a minimum free energy (MFE) to gauge DNA sequences based on set thermodynamic properties. When compared to the best constraints of the Hamming distance, our method yielded better thermodynamic qualities. We then used our improved genetic algorithm to obtain lower-bound DNA sequence sets. Here, we discuss the effects of novel constraint parameters on the free energy gap.  相似文献   

9.
Barcoded vectors are promising tools for investigating clonal diversity and dynamics in hematopoietic gene therapy. Analysis of clones marked with barcoded vectors requires accurate identification of potentially large numbers of individually rare barcodes, when the exact number, sequence identity and abundance are unknown. This is an inherently challenging application, and the feasibility of using contemporary next-generation sequencing technologies is unresolved. To explore this potential application empirically, without prior assumptions, we sequenced barcode libraries of known complexity. Libraries containing 1, 10 and 100 Sanger-sequenced barcodes were sequenced using an Illumina platform, with a 100-barcode library also sequenced using a SOLiD platform. Libraries containing 1 and 10 barcodes were distinguished from false barcodes generated by sequencing error by a several log-fold difference in abundance. In 100-barcode libraries, however, expected and false barcodes overlapped and could not be resolved by bioinformatic filtering and clustering strategies. In independent sequencing runs multiple false-positive barcodes appeared to be represented at higher abundance than known barcodes, despite their confirmed absence from the original library. Such errors, which potentially impact barcoding studies in an application-dependent manner, are consistent with the existence of both stochastic and systematic error, the mechanism of which is yet to be fully resolved.  相似文献   

10.
目的建立实验用小型猪微卫星标记的多重PCR体系和进行实验猪群的遗传监测。方法利用3种不同荧光标记的微卫星引物结合ABI3700遗传分析仪测序的方法,通过筛选和优化反应条件,建立可用于实验用小型猪遗传质量控制的稳定的多重PCR反应体系。在此基础上进一步检测实验用小型猪近交群体的遗传变异以验证建立体系的效率。结果筛选出了2组理想的组合:组合1包括SW742、S0228和S0218座位,复性温度58℃和56℃;组合2包括S0155、SW902和S0227三个座位,复性温度为60℃和58℃。组合内不同座位标记不同的荧光染料。还以此检测了实验用小型猪群体中的遗传变异。结论初步建立了中国三种实验用小型猪微卫星标记检测的多重PCR体系,为快速、大通量、准确的小型猪遗传监测提供了初步的技术基础。  相似文献   

11.
By virtue of advances in next generation sequencing technologies, we have access to new genome sequences almost daily. The tempo of these advances is accelerating, promising greater depth and breadth. In light of these extraordinary advances, the need for fast, parallel methods to define gene function becomes ever more important. Collections of genome-wide deletion mutants in yeasts and E. coli have served as workhorses for functional characterization of gene function, but this approach is not scalable, current gene-deletion approaches require each of the thousands of genes that comprise a genome to be deleted and verified. Only after this work is complete can we pursue high-throughput phenotyping. Over the past decade, our laboratory has refined a portfolio of competitive, miniaturized, high-throughput genome-wide assays that can be performed in parallel. This parallelization is possible because of the inclusion of DNA ''tags'', or ''barcodes,'' into each mutant, with the barcode serving as a proxy for the mutation and one can measure the barcode abundance to assess mutant fitness. In this study, we seek to fill the gap between DNA sequence and barcoded mutant collections. To accomplish this we introduce a combined transposon disruption-barcoding approach that opens up parallel barcode assays to newly sequenced, but poorly characterized microbes. To illustrate this approach we present a new Candida albicans barcoded disruption collection and describe how both microarray-based and next generation sequencing-based platforms can be used to collect 10,000 - 1,000,000 gene-gene and drug-gene interactions in a single experiment.  相似文献   

12.
13.
The multiplex PCR is one of the important methods to enrich the target DNAs for next generation sequencing. The non-specific amplification and interaction between the primers are the pivotal challenges of multiplex PCR. Here, we introduce the novel blunt hairpin primers for effective reducing the primer dimers and mispriming events. We also used a pair of auxiliary primers to enhance PCR efficiency. We simultaneously amplified 89 target regions from 44 samples and sequenced all amplicons on ion torrent PGM platform. Among all the filtrated amplicons (3438 different amplicons), 99.7, 97.6, 90.1 and 72.8% had sequencing depths fell within 200, 100, 50 and 25-fold range. The sequencing depth variations among all the samples were less than 27-fold. We also amplified multiplex regions with blunt hairpin, stick hairpin and normal linear primers, and the blunt hairpin primers could significantly reduce the amount of primer dimers and unspecific products.These results show that multiplex PCR with the blunt hairpin primers is a flexible, specific and economical target-region captured approach for the next generation sequencing.  相似文献   

14.
A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the number of laboratory procedures required to sequence the unknown DNA that falls in between contiguous sequences. Multiplex sequencing, a novel procedure in which multiple PCR primers are used in a single sequencing reaction, is used to interpret the multiplex PCR results. Two protocols are presented, one that minimizes pipetting and another that minimizes the number of reactions. The pipette optimized multiplex PCR method has been employed in the final phases of closing the Streptococcus pneumoniae genome sequence, with excellent results.  相似文献   

15.
We present dial-out PCR, a highly parallel method for retrieving accurate DNA molecules for gene synthesis. A complex library of DNA molecules is modified with unique flanking tags before massively parallel sequencing. Tag-directed primers then enable the retrieval of molecules with desired sequences by PCR. Dial-out PCR enables multiplex in vitro clone screening and is a compelling alternative to in vivo cloning and Sanger sequencing for accurate gene synthesis.  相似文献   

16.
Species identifications based on DNA barcoding rely on the correct identity of previously barcoded specimens, but little attention has been given to whether deposited barcodes include correspondence to the species' name-bearing type. The information content associated with COX1 sequences in the two most commonly used repositories of barcodes, GenBank and the Barcode of Life Data System (BOLD), is often insufficient for subsequent evaluation of the robustness of the identification procedure. We argue that DNA barcoding and taxonomy alike will benefit from more information content in the annotations of barcoded specimens as this will allow for validation and re-evaluation of the initial specimen identification. The aim should be to closely connect specimens from which reference barcodes are generated with the holotype through straight-forward taxonomy, and geographical and genetic correlations. Annotated information should also include voucher specimens and collector/identifier information. We examine two case studies based on empirical data, in which barcoding and taxonomy benefit from increased information content. On the basis of data from the first case study, we designate a barcoded neotype of the European medicinal leech, Hirudo medicinalis, on morphological and geographical grounds.  相似文献   

17.
Multiplexing is of vital importance for utilizing the full potential of next generation sequencing technologies. We here report TagGD (DNA-based Tag Generator and Demultiplexor), a fully-customisable, fast and accurate software package that can generate thousands of barcodes satisfying user-defined constraints and can guarantee full demultiplexing accuracy. The barcodes are designed to minimise their interference with the experiment. Insertion, deletion and substitution events are considered when designing and demultiplexing barcodes. 20,000 barcodes of length 18 were designed in 5 minutes and 2 million barcoded Illumina HiSeq-like reads generated with an error rate of 2% were demultiplexed with full accuracy in 5 minutes. We believe that our software meets a central demand in the current high-throughput biology and can be utilised in any field with ample sample abundance. The software is available on GitHub (https://github.com/pelinakan/UBD.git).  相似文献   

18.
? Premise of the study: Genome survey sequences (GSS) from massively parallel sequencing have potential to provide large, cost-effective data sets for phylogenetic inference, replace single gene or spacer regions as DNA barcodes, and provide a plethora of data for other comparative molecular evolution studies. Here we report on the application of this method to estimating the molecular phylogeny of core Asparagales, investigating plastid gene losses, assembling complete plastid genomes, and determining the type and quality of assembled genomic data attainable from Illumina 80-120-bp reads. ? Methods: We sequenced total genomic DNA from samples in two lineages of monocotyledonous plants, Poaceae and Asparagales, on the Illumina platform in a multiplex arrangement. We compared reference-based assemblies to de novo contigs, evaluated consistency of assemblies resulting from use of various references sequences, and assessed our methods to obtain sequence assemblies in nonmodel taxa. ? Key results: Our method returned reliable, robust organellar and nrDNA sequences in a variety of plant lineages. High quality assemblies are not dependent on genome size, amount of plastid present in the total genomic DNA template, or relatedness of available reference sequences for assembly. Phylogenetic results revealed familial and subfamilial relationships within Asparagales with high bootstrap support, although placement of the monotypic genus Aphyllanthes was placed with moderate confidence. ? Conclusions: The well-supported molecular phylogeny provides evidence for delineation of subfamilies within core Asparagales. With advances in technology and bioinformatics tools, the use of massively parallel sequencing will continue to become easier and more affordable for phylogenomic and molecular evolutionary biology investigations.  相似文献   

19.
We combined components of a previous assay referred to as Molecular Inversion Probe (MIP) with a complete gap filling strategy, creating a versatile powerful one-primer multiplex amplification system. As a proof-of-concept, this novel method, which employs a Connector Inversion Probe (CIPer), was tested as a genetic tool for pathogen diagnosis, typing, and antibiotic resistance screening with two distinct systems: i) a conserved sequence primer system for genotyping Human Papillomavirus (HPV), a cancer-associated viral agent and ii) screening for antibiotic resistance mutations in the bacterial pathogen Neisseria gonorrhoeae. We also discuss future applications and advances of the CIPer technology such as integration with digital amplification and next-generation sequencing methods. Furthermore, we introduce the concept of two-dimension informational barcodes, i.e. "multiplex multiplexing padlocks" (MMPs). For the readers' convenience, we also provide an on-line tutorial with user-interface software application CIP creator 1.0.1, for custom probe generation from virtually any new or established primer-pairs.  相似文献   

20.
Discovery of rare mutations in populations: TILLING by sequencing   总被引:1,自引:0,他引:1  
Discovery of rare mutations in populations requires methods, such as TILLING (for Targeting Induced Local Lesions in Genomes), for processing and analyzing many individuals in parallel. Previous TILLING protocols employed enzymatic or physical discrimination of heteroduplexed from homoduplexed target DNA. Using mutant populations of rice (Oryza sativa) and wheat (Triticum durum), we developed a method based on Illumina sequencing of target genes amplified from multidimensionally pooled templates representing 768 individuals per experiment. Parallel processing of sequencing libraries was aided by unique tracer sequences and barcodes allowing flexibility in the number and pooling arrangement of targeted genes, species, and pooling scheme. Sequencing reads were processed and aligned to the reference to identify possible single-nucleotide changes, which were then evaluated for frequency, sequencing quality, intersection pattern in pools, and statistical relevance to produce a Bayesian score with an associated confidence threshold. Discovery was robust both in rice and wheat using either bidimensional or tridimensional pooling schemes. The method compared favorably with other molecular and computational approaches, providing high sensitivity and specificity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号