首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A frameshift error detection algorithm for DNA sequencing projects.   总被引:2,自引:1,他引:2       下载免费PDF全文
During the determination of DNA sequences, frameshift errors are not the most frequent but they are the most bothersome as they corrupt the amino acid sequence over several residues. Detection of such errors by sequence alignment is only possible when related sequences are found in the databases. To avoid this limitation, we have developed a new tool based on the distribution of non-overlapping 3-tuples or 6-tuples in the three frames of an ORF. The method relies upon the result of a correspondence analysis. It has been extensively tested on Bacillus subtilis and Saccharomyces cerevisiae sequences and has also been examined with human sequences. The results indicate that it can detect frameshift errors affecting as few as 20 bp with a low rate of false positives (no more than 1.0/1000 bp scanned). The proposed algorithm can be used to scan a large collection of data, but it is mainly intended for laboratory practice as a tool for checking the quality of the sequences produced during a sequencing project.  相似文献   

2.
GEL--a computer tool for DNA sequencing projects.   总被引:1,自引:0,他引:1       下载免费PDF全文
The GEL program for entry and analysis of DNA sequencing information is discussed, and examples of interaction with the program are presented. The current version of the program represents the last of several revisions to the first GEL program, reported previously in this journal (1). Improvements and additions have been made, making the current GEL a particularly useful laboratory tool for molecular biologists engaged in DNA sequencing projects.  相似文献   

3.
4.
MOTIVATION: Investigators utilize gap estimates for DNA sequencing projects. Standard theories assume sequences are independently and identically distributed, leading to appreciable under-prediction of gaps. RESULTS: Using a statistical scaling factor and data from 20 representative whole genome shotgun projects, we construct regression equations that relate coverage to a normalized gap measure. Prokaryotic genomes do not correlate to sequence coverage, while eukaryotes show strong correlation if the chaff is ignored. Gaps decrease at an exponential rate of only about one-third of that predicted via theory alone. Case studies suggest that departure from theory can largely be attributed to assembly difficulties for repeat-rich genomes, but bias and coverage anomalies are also important when repeats are sparse. Such factors cannot be readily characterized a priori, suggesting upper limits on the accuracy of gap prediction. We also find that diminishing coverage probability discussed in other studies is a theoretical artifact that does not arise for the typical project.  相似文献   

5.
The rate limiting step in a large-scale sequencing project is the generation of single-stranded DNA templates. We describe a fast, semiautomated procedure, using 96-well microtitre plates, in which 192 templates can be readily prepared in 1 day. The technique can be carried out manually or can be semiautomated using a robot pipetting device. We also provide evidence for the reliability and applicability of this method to a large-scale sequencing project.  相似文献   

6.
MOTIVATION: With the potential availability of nanopore devices that can sense the bases of translocating single-stranded DNA (ssDNA), it is likely that 'reads' of length approximately 10(5) will be available in large numbers and at high speed. We address the problem of complete DNA sequencing using such reads.We assume that approximately 10(2) copies of a DNA sequence are split into single strands that break into randomly sized pieces as they translocate the nanopore in arbitrary orientations. The nanopore senses and reports each individual base that passes through, but all information about orientation and complementarity of the ssDNA subsequences is lost. Random errors (both biological and transduction) in the reads create further complications. RESULTS: We have developed an algorithm that addresses these issues. It can be considered an extreme variation of the well-known Eulerian path approach. It searches over a space of de Bruijn graphs until it finds one in which (a) the impact of errors is eliminated and (b) both possible orientations of the two ssDNA sequences can be identified separately and unambiguously.Our algorithm is able to correctly reconstruct real DNA sequences of the order of 10(6) bases (e.g. the bacterium Mycoplasma pneumoniae) from simulated erroneous reads on a modest workstation in about 1 h. We describe, and give measured timings of, a parallel implementation of this algorithm on the Cray Multithreaded Architecture (MTA-2) supercomputer, whose architecture is ideally suited to this 'unstructured' problem. Our parallel implementation is crucial to the problem of rapidly sequencing long DNA sequences and also to the situation where multiple nanopores are used to obtain a high-bandwidth stream of reads.  相似文献   

7.
We have developed a simple rapid plasmid DNA mini-preparation method which yields DNA of sufficient quality to be used in large scale sequencing projects. The method, which is a modification of the alkaline method of Birnboim and Doly (1979), requires less than two hours. We have eliminated the use of organic extractions, RNase digestion and alkaline denaturation of the DNA for annealing of the primer. The proportion of supercoiled plasmid DNA obtained is close to 100%. Greater than 80% of the clones yield at least 500 bp of sequence information per primer. The sequencing reactions from these double-stranded templates can be done on both strands using the universal and reverse sequence primers with the usual two reactions per primer, one to read close to the primer and one to read far from it. Thus, each clone yields at least 1 kb of sequence information. The preparation of the templates and the sequencing reactions can be done in less than three hours so that the sequencing gel can be run the same day.  相似文献   

8.
A system for shotgun DNA sequencing.   总被引:651,自引:197,他引:651       下载免费PDF全文
A multipurpose cloning site has been introduced into the gene for beta-galactosidase (beta-D-galactosidegalactohydrolase, EC 3.21.23) on the single-stranded DNA phage M13mp2 (Gronenborn, B. and Messing, J., (1978) Nature 272, 375-377) with the use of synthetic DNA. The site contributes 14 additional codons and does not affect the ability of the lac gene product to undergo intracistronic complementation. Two restriction endonuclease cleavage sites in the viral gene II were removed by single base-pair mutations. Using the new phage M13mp7, DNA fragments generated by cleavage with a variety of different restriction endonucleases can be cloned directly. The nucleotide sequences of the cloned DNAs can be determined rapidly by DNA synthesis using chain terminators and a synthetic oligonucleotide primer complementary to 15 bases preceeding the new array of restriction sites.  相似文献   

9.
10.
11.
We present a framework for detecting probes in oligonucleotide microarrays that may add significant error to measurements in hybridization experiments. Four types of so-called degenerate probe behavior are considered: secondary structure formation, self-dimerization, cross-hybridization, and dimerization. The framework uses a well-established model for computing the free energy of nucleic acid sequence hybridization and a novel method for the detection of patterns in hybridization experiment data. Our primary result is the identification of unique patterns in hybridization experiment data that are shown to correlate with each type of degenerate probe behavior. A support function for identifying degenerate probes from a large set of hybridization experiments is given and some preliminary experimental results are given for the Affymetrix HuGeneFL GeneChip. Finally, we show a strong relationship between the Affymetrix discrimination measure for a probe and the free-energy estimate from theoretical models of hybridization. In particular, probes on the HuGeneFL GeneChip with high free-energy estimates (weak hybridization) have almost always approximately zero discrimination. The framework can be applied to any Affymetrix oligonucleotide array, and the software is made freely available to the community.  相似文献   

12.
A method is described for the rapid purification of high quality lambda DNA. The method can be used from either liquid or plate lysates and on a small scale or a large scale. It relies on the preadsobtion of all polyanions present in the lysate to an "insoluble" anion-exchange matrix (DEAE or TEAE). Phage particles are then disrupted by combined treatment with EDTA/proteinase K and the resulting DNA is precipitated by the addition of the cationic detergent cetyl (or hexadecyl)-trimethyl ammonium bromide-CTAB ("soluble" anion-exchange matrix). The precipitated CTAB-DNA complex is then exchanged to Na-DNA and ethanol precipitated. The resultant purified DNA is suitable for enzymatic reactions and provides a high quality template for dideoxy-sequence analysis.  相似文献   

13.
A lack of pliant software tools that support small- to medium-scale DNA sequencing efforts is a major hindrance for recording and using laboratory workflow information to monitor the overall quality of data production. Here we describe VSQual, a set of Perl programs intended to provide simple and powerful tools to check several quality features of the sequencing data generated by automated DNA sequencing machines. The core program of VSQual is a flexible Perl-based pipeline, designed to be accessible and useful for both programmers and non-programmers. This pipeline directs the processing steps and can be easily customized for laboratory needs. Basically, the raw DNA sequencing trace files are processed by Phred and Cross_match, then the outputs are parsed, reformatted into Web-based graphical reports, and added to a Web site structure. The result is a set of real time sequencing reports easily accessible and understood by common laboratory people. These reports facilitate the monitoring of DNA sequencing as well as the management of laboratory workflow, significantly reducing operational costs and ensuring high quality and scientifically reliable results.  相似文献   

14.
SUMMARY: Manual processing of DNA methylation data from bisulfite sequencing is a tedious and error-prone task. Here we present an interactive software tool that provides start-to-end support for this process. In an easy-to-use manner, the tool helps the user to import the sequence files from the sequencer, to align them, to exclude or correct critical sequences, to document the experiment, to perform basic statistics and to produce publication-quality diagrams.Emphasis is put on quality control: The program automatically assesses data quality and provides warnings and suggestions for dealing with critical sequences. The BiQ Analyzer program is implemented in the Java programming language and runs on any platform for which a recent Java virtual machine is available. AVAILABILITY: The program is available without charge for non-commercial users and can be downloaded from http://biq-analyzer.bioinf.mpi-inf.mpg.de/  相似文献   

15.
Double stranded DNA sequencing as a choice for DNA sequencing.   总被引:6,自引:0,他引:6  
  相似文献   

16.
Optimizing and monitoring the data flow in high-throughput sequencing facilities is important for data input and output, for tracking the status of results for the users of the facility, and to guarantee a good, high-quality service. In a multi-user system environment with different throughputs, each user wants to access his/her data easily, track his/her sequencing history, analyze sequences and their quality, and apply some basic post-sequencing analysis, without the necessity of installing further software. Recently, Fiocruz established such a core facility as a "technological platform". Infrastructure includes a 48-capillary 3730 DNA Sequence Analyzer (Applied Biosystems) and supporting equipment. The service includes running samples for large-scale users, performing DNA sequencing reactions and runs for medium and small users, and participation in partial or full genome projects. We implemented a workflow that fulfills these requirements for small and high throughput users. Our implementation also includes the monitoring of data for continuous quality improvement (reports by plate, month and user) by the sequencing staff. For the user, different analyses of the chromatograms, such as visualization of good quality regions, as well as processing, such as comparisons or assemblies, are available. So far, 180 users have made use of the service, generating 155,000 sequences, 35% of which were produced for the BCG Moreau-RJ genome project. The pipeline (named ChromaPipe for Chromatogram Pipeline) is available for download by the scientific community at the url http://bioinfo.pdtis.fiocruz.br/ChromaPipe/. The support for assembly is also configured as a web service: http://bioinfo.pdtis.fiocruz.br/Assembly/.  相似文献   

17.
D L Mielke  M Russel 《Gene》1992,118(1):93-95
The TnphoA transposon constructed by Manoil and Beckwith [Proc. Natl. Acad. Sci. USA 82 (1985) 8129-8133] has been modified to permit easy isolation of single-stranded (ss) DNA of target plasmids. The intergenic region (IG) of filamentous phage f1, which consists of the phage origin of replication and packaging signal, was inserted into a nonessential region of TnphoA. This modified transposon should be useful for the analysis of genes cloned in plasmids that lack a filamentous phage IG. Transposition of TnphoA-IG into a plasmid carries the IG with it; subsequently, after infection with a filamentous helper phage, ss plasmid DNA suitable for sequence analysis and useful for oligodeoxyribonucleotide-mediated mutagenesis of TnphoA-generated fusions can be isolated. The utility of TnphoA-IG was confirmed by analysis of 'blue hops' into the bla (encoding beta-lactamase) and pspE (encoding phage shock protein) genes whose products are secreted into the Escherichia coli periplasm.  相似文献   

18.
The classical theory of shotgun DNA sequencing accounts for neither the placement dependencies that are a fundamental consequence of the forward-reverse sequencing strategy, nor the edge effect that arises for small to moderate-sized genomic targets. These phenomena are relevant to a number of sequencing scenarios, including large-insert BAC and fosmid clones, filtered genomic libraries, and macro-nuclear chromosomes. Here, we report a model that considers these two effects and provides both the expected value of coverage and its variance. Comparison to methyl-filtered maize data shows significant improvement over classical theory. The model is used to analyze coverage performance over a range of small to moderately-sized genomic targets. We find that the read pairing effect and the edge effect interact in a non-trivial fashion. Shorter reads give superior coverage per unit sequence depth relative to longer ones. In principle, end-sequences can be optimized with respect to template insert length; however, optimal performance is unlikely to be realized in most cases because of inherent size variation in any set of targets. Conversely, single-stranded reads exhibit roughly the same coverage attributes as optimized end-reads. Although linking information is lost, single-stranded data should not pose a significant assembly liability if the target represents predominantly low-copy sequence. We also find that random sequencing should be halted at substantially lower redundancies than those now associated with larger projects. Given the enormous amount of data generated per cycle on pyro-sequencing instruments, this observation suggests devising schemes to split each run cycle between twoor more projects. This would prevent over-sequencing and would further leverage the pyrosequencing method.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号