首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
DNA sequencing with positive and negative errors.   总被引:7,自引:0,他引:7  
The problem addressed in this paper is concerned with DNA sequencing by hybridization. An algorithm is proposed that solves a computational phase of this approach in the presence of both positive and negative errors resulting from the hybridization experiment. No a priori knowledge of the nature and source of these errors is required. An extensive set of computational experiments showed that the algorithm behaves surprisingly well if only positive errors appear. The general case, where positive and negative errors occur, can be also solved satisfactorily for an error rate up to 10%.  相似文献   

2.
MOTIVATION: A new heuristic algorithm for solving DNA sequencing by hybridization problem with positive and negative errors. RESULTS: A heuristic algorithm providing better solutions than algorithms known from the literature based on tabu search method.  相似文献   

3.
4.
A quality control algorithm for DNA sequencing projects.   总被引:2,自引:0,他引:2       下载免费PDF全文
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.  相似文献   

5.
Two four-point testcrosses comprising 87,000 tomato plants were grown and the data collected from 28 subgroups. Each subgroup consisted of 2,000 or 5,000 plants and should give a valid estimate of the three recombination values. The 28 values for each interval give more outlyers (23% are outside the 95% limits set by the standard deviation calculated by the binomial formula square root of p q/n) than would be expected by chance. If each subgroup was regarded as the control and the other groups tested against this, then 42% of the time the two subgroups would be significantly different. It is suggested that there are many cases in the literature where this comparison has been made and the significant difference wrongly ascribed to treatment. While the causes of these changes in recombination value are unknown and therefore uncontrollable, they must be anticipated in all such studies. Control and treatment must be replicated enough that chance extreme values will not be attributed to treatment.  相似文献   

6.
For the advancement of Human Genome Project, we have developed an automated DNA sequencing system HUGA-I. It is composed of several automated instruments and transfer robots connecting them. In this paper we describe the results of the performance evaluation test of HUGA-I. Although some of the system units showed good performances, the total performance of the HUGA-I was about 1/6 of the designed value. By revealing principal reasons of this poor performance, we would like to contribute to the automation in genome analysis, particularly in human genome analysis.Since the sequence technology advanced remarkably in these years, the system units of HUGA-I become older than those which are now commercially available and the throughput of it is out of our expectations. Nevertheless, we believe that it is meaningful to introduce the exact performance of HUGA-I and present the bottle neck points in the automating sequencing processes. Because, automation in the gene analysis is ultimately important, in particular for the analysis of large genomes such as the human genome. The aims of this paper are to introduce the results in performance evaluation of HUGA-I and to elucidate the bottle neck points in the automation of sequencing processes.The authors express their sincere thanks to Mr. Morisada Hayakawa and Mrs. Nobuko Kato for their technical asistance.  相似文献   

7.
The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of "correcting" this error (the "second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be "corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their "second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.  相似文献   

8.
9.
PCR and DNA sequencing   总被引:5,自引:0,他引:5  
Specific DNA segments defined by the sequence of two oligonucleotides can be enzymatically amplified up to a millionfold using the polymerase chain reaction (PCR). One of the most significant uses of this technique is for generation of sequencing templates, either from cloned inserts or directly from genomic DNA. To avoid the problem of reassociation of the linear DNA strands in the sequencing reaction, ssDNA templates can be produced directly in the PCR or generated directly from dsDNA by enzymatic treatment, electrophoretic separation or affinity purification. By combining PCR with direct sequencing, both the amplification and the sequencing reaction can be performed in the same vial. Finally, use of fluorescently labeled terminators or sequencing primers will allow the whole procedure to be amenable to complete automation.  相似文献   

10.
We introduce Quake, a program to detect and correct errors in DNA sequencing reads. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Quake achieves the highest accuracy on realistically simulated reads. We further demonstrate substantial improvements in de novo assembly and SNP detection after using Quake. Quake can be used for any size project, including more than one billion human reads, and is freely available as open source software from .  相似文献   

11.
A lack of pliant software tools that support small- to medium-scale DNA sequencing efforts is a major hindrance for recording and using laboratory workflow information to monitor the overall quality of data production. Here we describe VSQual, a set of Perl programs intended to provide simple and powerful tools to check several quality features of the sequencing data generated by automated DNA sequencing machines. The core program of VSQual is a flexible Perl-based pipeline, designed to be accessible and useful for both programmers and non-programmers. This pipeline directs the processing steps and can be easily customized for laboratory needs. Basically, the raw DNA sequencing trace files are processed by Phred and Cross_match, then the outputs are parsed, reformatted into Web-based graphical reports, and added to a Web site structure. The result is a set of real time sequencing reports easily accessible and understood by common laboratory people. These reports facilitate the monitoring of DNA sequencing as well as the management of laboratory workflow, significantly reducing operational costs and ensuring high quality and scientifically reliable results.  相似文献   

12.
13.
With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina''s MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.  相似文献   

14.
M. Ya. Azbel 《Biopolymers》1980,19(1):95-109
We show that the fine oscillatory structure of the DNA melting curve can be used to determine explicitly the nucleotide composition and the order of certain domains within the DNA. If DNA is specifically fragmented, the order of fragments can be learned directly from a comparison of the differential melting curves of the nonfragmented and fragmented DNA. The indicated information may complement exact methods of DNA sequencing. The proposed analysis is applied to bacteriophage ?X-174, whose melting curve is known. Compared to the known ?X-174 DNA sequence, the results of the analysis are found to be very accurate.  相似文献   

15.
DNA sequencing and gene structure   总被引:11,自引:0,他引:11  
  相似文献   

16.
SUMMARY: Manual processing of DNA methylation data from bisulfite sequencing is a tedious and error-prone task. Here we present an interactive software tool that provides start-to-end support for this process. In an easy-to-use manner, the tool helps the user to import the sequence files from the sequencer, to align them, to exclude or correct critical sequences, to document the experiment, to perform basic statistics and to produce publication-quality diagrams.Emphasis is put on quality control: The program automatically assesses data quality and provides warnings and suggestions for dealing with critical sequences. The BiQ Analyzer program is implemented in the Java programming language and runs on any platform for which a recent Java virtual machine is available. AVAILABILITY: The program is available without charge for non-commercial users and can be downloaded from http://biq-analyzer.bioinf.mpi-inf.mpg.de/  相似文献   

17.
DNA microarray and next-generation DNA sequencing technologies are important tools for high-throughput genome research, in revealing both the structural and functional characteristics of genomes. In the past decade the DNA microarray technologies have been widely applied in the studies of functional genomics, systems biology and pharmacogenomics. The next-generation DNA sequencing method was first introduced by the 454 Company in 2003, immediately followed by the establishment of the Solexa and Solid techniques by other biotech companies. Though it has not been long since the first emergence of this technology, with the fast and impressive improvement, the application of this technology has extended to almost all fields of genomics research, as a rival challenging the existing DNA microarray technology. This paper briefly reviews the working principles of these two technologies as well as their application and perspectives in genome research. Supported by the National High-Tech Research Program of China (Grant No.2006AA020704) and Shanghai Science and Technology Commission (Grant No. 05DZ22201)  相似文献   

18.
SAGE data are obtained by sequencing short DNA tags. Due to the mistakes in DNA sequencing, SAGE data contain errors. We propose a new approach to identify tags whose abundance is biased by sequencing errors. This approach is based on a concept of neighbourhood: abundant tags can contaminate tags whose sequence is very close. The application of our approach reveals that moderately abundant tags can be generated by sequencing errors uniquely. It also allows for detecting correct rare tags. AVAILABILITY: Software is available only to non-profit entities and for non-commercial purposes upon request.  相似文献   

19.
Over the past few years, technological advances in automated DNA sequencing have had a profound effect on the nature of DNA sequencing laboratories. To characterize the changes occurring within DNA sequencing facilities, the DNA Sequencing Research Group conducted three previous studies, in 1998, 2000, and 2003. A new general survey has been designed and conducted by the DSRG to capture the current status of DNA sequencing facilities in all sectors. Included were questions regarding facility administration, pricing, instrumentation, technology, protocols, and operation. The results of the survey are presented here, accompanied by comparisons to the previous surveys. These comparisons formed a basis for the discussion of trends within the facilities in response to the dynamics of a changing technology.  相似文献   

20.
Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained by hybridization to a universal DNA array. Standard universal arrays contain all k-mers for some fixed k, typically 8 to 10. Currently, in spite of its promise and elegance, SBH is not competitive with standard gel-based sequencing methods. This is due to two main reasons: lack of tools to handle realistic levels of hybridization errors and an inherent limitation on the length of uniquely reconstructible sequence by standard universal arrays. In this paper, we deal with both problems. We introduce a simple polynomial reconstruction algorithm which can be applied to spectra from standard arrays and has provable performance in the presence of both false negative and false positive errors. We also propose a novel design of chips containing universal bases that differs from the one proposed by Preparata et al. (1999). We give a simple algorithm that uses spectra from such chips to reconstruct with high probability random sequences of length lower only by a squared log factor compared to the information theoretic bound. Our algorithm is very robust to errors and has a provable performance even if there are both false negative and false positive errors. Simulations indicate that its sensitivity to errors is also very small in practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号