首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The well-known massively parallel sequencing method is efficient and it can obtain sequence data from multiple individual samples. In order to ensure that sequencing, replication, and oligonucleotide synthesis errors do not result in tags (or barcodes) that are unrecoverable or confused, the tag sequences should be abundant and sufficiently different. Recently, many design methods have been proposed for correcting errors in data using error-correcting codes. The existing tag sets contain small tag sequences, so we used a modified genetic algorithm to improve the lower bound of the tag sets in this study. Compared with previous research, our algorithm is effective for designing sets of DNA tags. Moreover, the GC content determined by existing methods includes an imprecise range. Thus, we improved the GC content determination method to obtain tag sets that control the GC content in a more precise range. Finally, previous studies have only considered perfect self-complementarity. Thus, we considered the crossover between different tags and introduced an improved constraint into the design of tag sets.  相似文献   

2.
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.  相似文献   

3.

Background

Short oligonucleotides can be used as markers to tag and track DNA sequences. For example, barcoding techniques (i.e. Multiplex Identifiers or Indexing) use short oligonucleotides to distinguish between reads from different DNA samples pooled for high-throughput sequencing. A similar technique called molecule tagging uses the same principles but is applied to individual DNA template molecules. Each template molecule is tagged with a unique oligonucleotide prior to polymerase chain reaction. The resulting amplicon sequences can be traced back to their original templates by their oligonucleotide tag. Consensus building from sequences sharing the same tag enables inference of original template molecules thereby reducing effects of sequencing error and polymerase chain reaction bias. Several independent groups have developed similar protocols for molecule tagging; however, user-friendly software for build consensus sequences from molecule tagged reads is not readily available or is highly specific for a particular protocol.

Results

MT-Toolbox recognizes oligonucleotide tags in amplicons and infers the correct template sequence. On a set of molecule tagged test reads, MT-Toolbox generates sequences having on average 0.00047 errors per base. MT-Toolbox includes a graphical user interface, command line interface, and options for speed and accuracy maximization. It can be run in serial on a standard personal computer or in parallel on a Load Sharing Facility based cluster system. An optional plugin provides features for common 16S metagenome profiling analysis such as chimera filtering, building operational taxonomic units, contaminant removal, and taxonomy assignments.

Conclusions

MT-Toolbox provides an accessible, user-friendly environment for analysis of molecule tagged reads thereby reducing technical errors and polymerase chain reaction bias. These improvements reduce noise and allow for greater precision in single amplicon sequencing experiments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-284) contains supplementary material, which is available to authorized users.  相似文献   

4.
High‐throughput sequencing (HTS) of PCR amplicons is becoming the method of choice to sequence one or several targeted loci for phylogenetic and DNA barcoding studies. Although the development of HTS has allowed rapid generation of massive amounts of DNA sequence data, preparing amplicons for HTS remains a rate‐limiting step. For example, HTS platforms require platform‐specific adapter sequences to be present at the 5′ and 3′ end of the DNA fragment to be sequenced. In addition, short multiplex identifier (MID) tags are typically added to allow multiple samples to be pooled in a single HTS run. Existing methods to incorporate HTS adapters and MID tags into PCR amplicons are either inefficient, requiring multiple enzymatic reactions and clean‐up steps, or costly when applied to multiple samples or loci (fusion primers). We describe a method to amplify a target locus and add HTS adapters and MID tags via a linker sequence using a single PCR. We demonstrate our approach by generating reference sequence data for two mitochondrial loci (COI and 16S) for a diverse suite of insect taxa. Our approach provides a flexible, cost‐effective and efficient method to prepare amplicons for HTS.  相似文献   

5.
MOTIVATION: In gene discovery projects based on EST sequencing, effective post-sequencing identification methods are important in determining tissue sources of ESTs within pooled cDNA libraries. In the past, such identification efforts have been characterized by higher than necessary failure rates due to the presence of errors within the subsequence containing the oligo tag intended to define the tissue source for each EST. RESULTS: A large-scale EST-based gene discovery program at The University of Iowa has led to the creation of a unique software method named UITagCreator usable in the creation of large sets of synthetic tissue identification tags. The identification tags provide error detection and correction capability and, in conjunction with automated annotation software, result in a substantial improvement in the accurate identification of the tissue source in the presence of sequencing and base-calling errors. These identification rates are favorable, relative to past paradigms. AVAILABILITY: The UITagCreator source code and installation instructions, along with detection software usable in concert with created tag sets, is freely available at http://genome.uiowa.edu/pubsoft/software.html CONTACT: tomc@eng.uiowa.edu  相似文献   

6.
Parallel tagged sequencing on the 454 platform   总被引:2,自引:0,他引:2  
Parallel tagged sequencing (PTS) is a molecular barcoding method designed to adapt the recently developed high-throughput 454 parallel sequencing technology for use with multiple samples. Unlike other barcoding methods, PTS can be applied to any type of double-stranded DNA (dsDNA) sample, including shotgun DNA libraries and pools of PCR products, and requires no amplification or gel purification steps. The method relies on attaching sample-specific barcoding adapters, which include sequence tags and a restriction site, to blunt-end repaired DNA samples by ligation and strand-displacement. After pooling multiple barcoded samples, molecules without sequence tags are effectively excluded from sequencing by dephosphorylation and restriction digestion, and using the tag sequences, the source of each DNA sequence can be traced. This protocol allows for sequencing 300 or more complete mitochondrial genomes on a single 454 GS FLX run, or twenty-five 6-kb plasmid sequences on only one 16th plate region. Most of the reactions can be performed in a multichannel setup on 96-well reaction plates, allowing for processing up to several hundreds of samples in a few days.  相似文献   

7.
We describe a technique, sequence-tagged microsatellite profiling (STMP), to rapidly generate large numbers of simple sequence repeat (SSR) markers from genomic or cDNA. This technique eliminates the need for library screening to identify SSR-containing clones and provides an ~25-fold increase in sequencing throughput compared to traditional methods. STMP generates short but characteristic nucleotide sequence tags for fragments that are present within a pool of SSR amplicons. These tags are then ligated together to form concatemers for cloning and sequencing. The analysis of thousands of tags gives rise to a representational profile of the abundance and frequency of SSRs within the DNA pool, from which low copy sequences can be identified. As each tag contains sufficient nucleotide sequence for primer design, their conversion into PCR primers allows the amplification of corresponding full-length fragments from the pool of SSR amplicons. These fragments permit the full characterisation of a SSR locus and provide flanking sequence for the development of a microsatellite marker. Alternatively, sequence tag primers can be used to directly amplify corresponding SSR loci from genomic DNA, thereby reducing the cost of developing a microsatellite marker to the synthesis of just one sequence-specific primer. We demonstrate the utility of STMP by the development of SSR markers in bread wheat.  相似文献   

8.
We describe a technique, sequence-tagged microsatellite profiling (STMP), to rapidly generate large numbers of simple sequence repeat (SSR) markers from genomic or cDNA. This technique eliminates the need for library screening to identify SSR-containing clones and provides an approximately 25-fold increase in sequencing throughput compared to traditional methods. STMP generates short but characteristic nucleotide sequence tags for fragments that are present within a pool of SSR amplicons. These tags are then ligated together to form concatemers for cloning and sequencing. The analysis of thousands of tags gives rise to a representational profile of the abundance and frequency of SSRs within the DNA pool, from which low copy sequences can be identified. As each tag contains sufficient nucleotide sequence for primer design, their conversion into PCR primers allows the amplification of corresponding full-length fragments from the pool of SSR amplicons. These fragments permit the full characterisation of a SSR locus and provide flanking sequence for the development of a microsatellite marker. Alternatively, sequence tag primers can be used to directly amplify corresponding SSR loci from genomic DNA, thereby reducing the cost of developing a microsatellite marker to the synthesis of just one sequence-specific primer. We demonstrate the utility of STMP by the development of SSR markers in bread wheat.  相似文献   

9.
10.
We present dial-out PCR, a highly parallel method for retrieving accurate DNA molecules for gene synthesis. A complex library of DNA molecules is modified with unique flanking tags before massively parallel sequencing. Tag-directed primers then enable the retrieval of molecules with desired sequences by PCR. Dial-out PCR enables multiplex in vitro clone screening and is a compelling alternative to in vivo cloning and Sanger sequencing for accurate gene synthesis.  相似文献   

11.
Primer-design for multiplexed genotyping   总被引:8,自引:1,他引:8       下载免费PDF全文
Single-nucleotide polymorphism (SNP) analysis is a powerful tool for mapping and diagnosing disease-related alleles. Mutation analysis by polymerase-mediated single-base primer extension (minisequencing) can be massively parallelized using DNA microchips or flow cytometry with microspheres as solid support. By adding a unique oligonucleotide tag to the 5′ end of the minisequencing primer and attaching the complementary antitag to the array or bead surface, the assay can be ‘demultiplexed’. Such high-throughput scoring of SNPs requires a high level of primer multiplexing in order to analyze multiple loci in one assay, thus enabling inexpensive and fast polymorphism scoring. We present a computer program to automate the design process for the assay. Oligonucleotide primers for the reaction are automatically selected by the software, a unique DNA tag/antitag system is generated, and the pairing of primers and DNA tags is automatically done in a way to avoid any crossreactivity. We report results on a 45-plex genotyping assay, indicating that minisequencing can be adapted to be a powerful tool for high-throughput, massively parallel genotyping. The software is available to academic users on request.  相似文献   

12.
13.
We have developed a new and simple method for quantitatively analyzing global gene expression profiles from cells or tissues. The process, called TALEST, or tandem arrayed ligation of expressed sequence tags, employs an oligonucleotide adapter containing a type IIs restriction enzyme site to facilitate the generation of short (16 bp) ESTs of fixed position in the mRNA. These ESTs are flanked by GC-clamped punctuation sequences which render them resistant to thermal denaturation, allowing their concatenation into long arrays and subsequent recognition and analysis by high-throughput DNA sequencing. A major advantage of the TALEST technique is the avoidance of PCR in all stages of the process and hence the attendant sequence-specific amplification biases that are inherent in other gene expression profiling methods such as SAGE, Differential Display, AFLP, etc. which rely on PCR.  相似文献   

14.
15.
Wang H  Fang J  Liang C  He M  Li Q  Chu C 《BioTechniques》2011,51(6):421-423
SiteFinding-PCR is a method for isolating flanking sequence tags (FSTs) of T-DNA insertion lines, but the efficiency needs to be improved. Here we report a computation-assisted design for the random primers used in SiteFinding- PCR. A short sequence, GCATG, was screened from the rice genome and used as the 3' end of the random primer. When applying the optimized primer for isolating FSTs from 168 transgenic rice lines, we obtained 107 specific products, including 64 FSTs. The efficiency of obtaining FSTs using the modified version of SiteFinding-PCR increased by 73.0% compared with the method previously reported (P < 0.01, μ test). We also provide computational results for several other plant species such as maize, sorghum, Arabidopsis, foxtail millet, and Brachypodium based on the available genome data, so that the modified method could be easily adapted to other species.  相似文献   

16.
Xu K  Doak TG  Lipps HJ  Wang J  Swart EC  Chang WJ 《Gene》2012,498(1):75-80
Genome-wide methylation studies frequently lack adequate controls to estimate proportions of background reads in the resulting datasets. To generate appropriate control pools, we developed technique termed nMETR (non-methylated tag recovery) based on digestion of genomic DNA with methylation-sensitive restriction enzyme, ligation of adapter oligonucleotide and PCR amplification of non-methylated sites associated with genomic repetitive elements. The protocol takes only two working days to generate amplicons for deep sequencing. We applied nMETR for human DNA using BspFNI enzyme and retrotransposon Alu-specific primers. 454-sequencing enabled identification of 1113 nMETR tag sites, of them ~65% were parts of CpG islands. Representation of reads inversely correlated with methylation levels, thus confirming nMETR fidelity. We created software that eliminates background reads and enables to map and annotate individual tags on human genome. nMETR tags may serve as the controls for large-scale epigenetic studies and for identifying unmethylated transposable elements located close to genomic CpG islands.  相似文献   

17.
We have localized 38 human brain cDNA sequences to individual human chromosomes. PCR primers were designed from expressed sequence tags and tested for specific amplification from human genomic DNA. The sizes of amplification products from DNA of somatic cell hybrid mapping panels were determined electrophoretically using an automated fluorescence detection system. Chromosomal assignments were made by discordancy analysis.  相似文献   

18.
Filtration techniques in the form of rapid elimination of candidate sequences while retaining the true one are key ingredients of database searches in genomics. Although SEQUEST and Mascot perform a conceptually similar task to the tool BLAST, the key algorithmic idea of BLAST (filtration) was never implemented in these tools. As a result MS/MS protein identification tools are becoming too time-consuming for many applications including search for post-translationally modified peptides. Moreover, matching millions of spectra against all known proteins will soon make these tools too slow in the same way that "genome vs genome" comparisons instantly made BLAST too slow. We describe the development of filters for MS/MS database searches that dramatically reduce the running time and effectively remove the bottlenecks in searching the huge space of protein modifications. Our approach, based on a probability model for determining the accuracy of sequence tags, achieves superior results compared to GutenTag, a popular tag generation algorithm. Our tag generating algorithm along with our de novo sequencing algorithm PepNovo can be accessed via the URL http://peptide.ucsd.edu/.  相似文献   

19.
In shotgun proteomics, tandem mass spectra of peptides are typically identified through database search algorithms such as Sequest. We have developed DirecTag, an open-source algorithm to infer partial sequence tags directly from observed fragment ions. This algorithm is unique in its implementation of three separate scoring systems to evaluate each tag on the basis of peak intensity, m/ z fidelity, and complementarity. In data sets from several types of mass spectrometers, DirecTag reproducibly exceeded the accuracy and speed of InsPecT and GutenTag, two previously published algorithms for this purpose. The source code and binaries for DirecTag are available from http://fenchurch.mc.vanderbilt.edu.  相似文献   

20.
MAST方法采用人工文库的DNA标签序列鉴定mRNA的可接近位点。大量的标签序列通过扩增和克隆测序达到阐明mRNA结合位点图。设计了单一引物的PCR,其引物在标签序列两端结合搭桥,在扩增中DNA标签序列在搭桥引物的作用下进行连接,连接的标签序列再克隆和测序。十几条这样的连接产物包含了上千条标签序列。该PCR方法简单、高效以用于高通量的方式对标签序列测序。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号