首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
DNA sequence predicted from polyacrylamide gel-based technologies is inaccurate because of variations in the quality of the primary data due to limitations of the technology, and to sequence-specific variations due to nucleotide interactions within the DNA molecule and with the gel. The ability to recognize the probability of error in the primary data will be useful in reconstructing the target sequence of a DNA sequencing project, and in estimating the accuracy of the final sequence. This paper describes the use of linear discriminant analysis to assign position-specific probabilities of incorrect, over- and under-prediction of nucleotides for each predicted nucleotide position in primary sequence data generated by a gel-based DNA sequencing technology. Using this method, most of the error potential in primary sequence data can be assigned to a limited number of discrete positions. The use of probability values in the sequence reconstruction process, and in estimating the accuracy of consensus sequence determination is described.  相似文献   

2.
A strategy for rapid DNA sequence acquisition in an ordered, nonrandom manner, while retaining all of the conveniences of the dideoxy method with M13 transducing phage DNA template, is described. Target DNA 3 to 14 kb in size can be stably carried by our M13 vectors. Suitable targets are stretches of DNA which lack an enzyme recognition site which is unique on our cloning vectors and adjacent to the sequencing primer; current sites that are so useful when lacking are Pst, Xba, HindIII, BglII, EcoRI. By an in vitro procedure, we cut RF DNA once randomly and once specifically, to create thousands of deletions which start at the unique restriction site adjacent to the dideoxy sequencing primer and extend various distances across the target DNA. Phage carrying a desired size of deletions, whose DNA as template will give rise to DNA sequence data in a desired location along the target DNA, may be purified by electrophoresis alive on agarose gels. Phage running in the same location on the agarose gel thus conveniently give rise to nucleotide sequence data from the same kilobase of target DNA.  相似文献   

3.
This paper describes a new way of storing DNA gel reading data and an accompanying set of computer programs. These programs will perform all the manipulations that are required on data gained by the so-called 'shotgun' method of DNA sequencing. This system simplifies the computer processing involved with this sequencing method and also has the capability of being able at any time during a project to display, lined up in register, all the gel reading covering any section of the sequence.  相似文献   

4.
DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies.  相似文献   

5.
Procedures are presented for reliable and accurate nucleotide sequence analysis using as template supercoiled DNA prepared by a modified rapid boiling minipreparation protocol. This method yields DNA templates suitable for sequencing within 1 h of bacterial harvest. We describe optimal reaction conditions for supercoiled miniprep DNA sequencing using a modified T7 DNA polymerase (Sequenase) in dideoxynucleotide chain termination reactions. We demonstrate that under these conditions, the sequencing data obtained with miniprep DNA is indistinguishable from that obtained with CsCl purified supercoiled DNA or from that obtained using single stranded DNA templates. We further show that the supercoiled DNA sequencing reactions can be analyzed on a commercially available automated DNA sequencing system that detects 32P labeled DNA during its electrophoretic separation. Taken together, these developments represent a significant improvement in the process of nucleotide sequence analysis.  相似文献   

6.
L H Guo  R Wu 《Nucleic acids research》1982,10(6):2065-2084
We describe improve enzymatic methods for sequencing method for sequencing DNA. They are based on partial digestion of duplex DNA with exonuclease III to produce DNA molecules with 3' ends shortened to varying lengths, followed by repair synthesis to extend and label the 3' ends. After asymmetrical cleavage of the DNA with a restriction enzyme, the labeled products are separated by gel electrophoresis and the sequence read from the autoradiogram. The entire procedures, beginning with unrestricted DNA and followed through gel electrophoresis, takes only one day for sequencing both strands of the DNA molecule. These methods are especially suitable for sequencing DNA cloned in plasmid vectors, and they greatly extend the usefulness of the dideoxynucleotide chain termination method of Sanger et al. (Proc. Natl. Acad. Sci. USA 74, 5463, 1977). Using these methods we have determined the sequence of a 410 base pair fragment which includes the yeast SUP3 tyrosine tRNA gene.  相似文献   

7.
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.  相似文献   

8.

Background

Telomeres are the protective arrays of tandem TTAGGG sequence and associated proteins at the termini of chromosomes. Telomeres shorten at each cell division due to the end-replication problem and are maintained above a critical threshold in malignant cancer cells to prevent cellular senescence or apoptosis. With the recent advances in massive parallel sequencing, assessing telomere content in the context of other cancer genomic aberrations becomes an attractive possibility. We present the first comprehensive analysis of telomeric DNA content change in tumors using whole-genome sequencing data from 235 pediatric cancers.

Results

To measure telomeric DNA content, we counted telomeric reads containing TTAGGGx4 or CCCTAAx4 and normalized to the average genomic coverage. Changes in telomeric DNA content in tumor genomes were clustered using a Bayesian Information Criterion to determine loss, no change, or gain. Using this approach, we found that the pattern of telomeric DNA alteration varies dramatically across the landscape of pediatric malignancies: telomere gain was found in 32% of solid tumors, 4% of brain tumors and 0% of hematopoietic malignancies. The results were validated by three independent experimental approaches and reveal significant association of telomere gain with the frequency of somatic sequence mutations and structural variations.

Conclusions

Telomere DNA content measurement using whole-genome sequencing data is a reliable approach that can generate useful insights into the landscape of the cancer genome. Measuring the change in telomeric DNA during malignant progression is likely to be a useful metric when considering telomeres in the context of the whole genome.  相似文献   

9.
In recent years, massive sequencing approaches have allowed us to determine genomic structures of various organisms rapidly, raising novel applicability of the high-throughput sequence data obtained to various fields of biological studies. We present here a pipeline to search for microsatellite DNA and design PCR primers encompassing the microsatellites on genomic sequence data produced by 454 pyrosequencing. We tested this pipeline, called ‘Auto-primer’, on several fish genomic sequences and obtained many and various candidates for microsatellite DNA loci useful for detecting intraspecies genetic variability. This in silico search for microsatellite DNA is superior to conventional cloning methods, since any sequence patterns of repeat unit can be screened.  相似文献   

10.
DNA sequence analysis by MALDI mass spectrometry.   总被引:6,自引:4,他引:2       下载免费PDF全文
Conventional DNA sequencing is based on gel electrophoretic separation of the sequencing products. Gel casting and electrophoresis are the time limiting steps, and the gel separation is occasionally imperfect due to aberrant mobility of certain fragments, leading to erroneous sequence determination. Furthermore, illegitimately terminated products frequently cannot be distinguished from correctly terminated ones, a phenomenon that also obscures data interpretation. In the present work the use of MALDI mass spectrometry for sequencing of DNA amplified from clinical samples is implemented. The unambiguous and fast identification of deletions and substitutions in DNA amplified from heterozygous carriers realistically suggest MALDI mass spectrometry as a future alternative to conventional sequencing procedures for high throughput screening for mutations. Unique features of the method are demonstrated by sequencing a DNA fragment that could not be sequenced conventionally because of gel electrophoretic band compression and the presence of multiple non-specific termination products. Taking advantage of the accurate mass information provided by MALDI mass spectrometry, the sequence was deduced, and the nature of the non-specific termination could be determined. The method described here increases the fidelity in DNA sequencing, is fast, compatible with standard DNA sequencing procedures, and amenable to automation.  相似文献   

11.
Automated DNA sequencing utilizing fluorescently labeled primers is a proven methodology for generating quality sequence data. However, for directed primer walking strategies this necessitates synthesis and labeling of a unique primer for each sequencing reaction. Here, we describe a rapid ligation-based method of generating labeled sequencing primers. An unlabeled 5'-phosphorylated sequencing primer is ligated to a fluorescent oligonucleotide by use of a bridge primer which is complementary to portions of the previous two oligonucleotides, thus aligning them properly for ligation. The resulting fluorescent hybrid primer can be utilized directly in cycle sequencing reactions without any prior purification.  相似文献   

12.
Precise manipulations with genetic material, typical for modern experiments in molecular biology and in new biotechnology, require a capability to determine DNA base sequence. This capability enables today to exploit specific genetic knowledge for the dissection of complex cell processes and for modulation of cell metabolism in transgenic organisms. The review focuses on such DNA sequencing technologies that are widespread in general laboratory practice. They can safely be called, with the availability of commercial reagents, industrial techniques. Modern DNA sequencing requires recurrent breakdown of large genomic DNA into smaller pieces, that are then amplified, sequenced and the initial long stretch reconstructed via overlap of small pieces. The DNA sequencing process has several steps: a DNA fragment is obtained in sufficient quantity and purity, it is converted to a form suitable for a particular sequencing method, a sequencing reaction is performed and its products fractionated; and finally the resultant data are interpreted (i.e. an autoradiograph is read into a computer memory) and a long sequence in reconstructed via overlap of short stretches. These steps are considered in separate parts; an accent is made on sequencing strategies with respect to their biological task. In the last part, possibilities for automation of sequencing experiment are considered, followed by a discussion of domestic problems in DNA sequencing.  相似文献   

13.

Background

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.

Results

In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.

Conclusion

Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-264) contains supplementary material, which is available to authorized users.  相似文献   

14.
DNALA是一种个人DNA数据隐私保护的方法。该方法能有效的实现对个人DNA数据的隐私保护,但前期数据预处理复杂,而且后期处理精度不高。本文针对DNALA的这些缺点进行改进,形成了Savior算法。Savior算法在数据预处理阶段用双序列比对代替了DNALA中的多序列比对,在随后的处理中用随机爬山法代替了DNALA中的贪心策略,从而克服了原算法的缺点。对比实验说明:在达到同样的保护强度时,Savior对数据的改动小于DNALA,数据预处理耗费的时间小于DNALA。  相似文献   

15.
Sequencing of megabase plus DNA by hybridization: theory of the method   总被引:42,自引:0,他引:42  
A mismatch-free hybridization of oligonucleotides containing from 11 to 20 monomers to unknown DNA represents, in essence, a sequencing of a complementary target. Realizing this, we have used probability calculations and, in part, computer simulations to estimate the types and numbers of oligonucleotides that would have to be synthesized in order to sequence a megabase plus segment of DNA. We estimate that 95,000 specific mixes of 11-mers, mainly of the 5'(A,T,C,G)(A,T,C,G)N8(A,T,C,G)3' type, hybridized consecutively to dot blots of cloned genomic DNA fragments would provide primary data for the sequence assembly. An optimal mixture of representative libraries in M13 vector, having inserts of (i) 7 kb, (ii) 0.5 kb genomic fragments randomly ligated in up to 10-kb inserts, and (iii) tandem "jumping" fragments 100 kb apart in the genome, will be needed. To sequence each million base pairs of DNA, one would need hybridization data from about 2100 separate hybridization sample dots. Inevitable gaps and uncertainties in alignment of sequenced fragments arising from nonrandom or repetitive sequence organization of complex genomes and difficulties in cloning "poisonous" sequences in Escherichia coli, inherent to large sequencing by any method, have been considered and minimized by choice of libraries and number of subclones used for hybridization. Because it is based on simpler biochemical procedures, our method is inherently easier to automate than existing sequencing methods. The sequence can be derived from simple primary data only by extensive computing. Phased experimental tests and computer simulations increasing in complexity are needed before accurate estimates can be made in terms of cost and speed of sequencing by the new approach. Nevertheless, sequencing by hybridization should show advantages over existing methods because of the inherent redundancy and parallelism in its data gathering.  相似文献   

16.
17.
One of the main endeavors in today's life science remains the efficient sequencing of long DNA molecules. Today, most de novo sequencing of DNA is still performed using the electrophoresis-based Sanger concept of 1977, in spite of certain restrictions of this method. Methods using mass spectrometry to acquire the Sanger sequencing data are limited by short sequencing lengths of 15-25 nt. We propose a new method for DNA sequencing using base-specific cleavage and mass spectrometry that appears to be a promising alternative to classical DNA sequencing approaches. A single stranded DNA or RNA molecule is cleaved by a base-specific (bio-)chemical reaction using, for example, RNAses. The cleavage reaction is modified such that not all, but only a certain percentage of bases are cleaved. The resulting mixture of fragments is then analyzed using MALDI-TOF mass spectrometry, whereby we acquire the molecular masses of fragments. For every peak in the mass spectrum, we calculate those base compositions that will potentially create a peak of the observed mass and, repeating the cleavage reaction for all four bases, finally try to uniquely reconstruct the underlying sequence from these observed spectra. This leads us to the combinatorial problem of sequencing from compomers and, finally, to the graph-theoretical problem of finding a walk in a subgraph of the de Bruijn graph. Application of this method to simulated data indicates that it might be capable of sequencing DNA molecules with 200+ nt.  相似文献   

18.
《Nature methods》2005,2(8):629-630
This method is used to extend partial cDNA clones by amplifying the 5' sequences of the corresponding mRNAs 1-3. The technique requires knowledge of only a small region of sequence within the partial cDNA clone. During PCR, the thermostable DNA polymerase is directed to the appropriate target RNA by a single primer derived from the region of known sequence; the second primer required for PCR is complementary to a general feature of the target-in the case of 5' RACE, to a homopolymeric tail added (via terminal transferase) to the 3' termini of cDNAs transcribed from a preparation of mRNA. This synthetic tail provides a primer-binding site upstream of the unknown 5' sequence of the target mRNA. The products of the amplification reaction are cloned into a plasmid vector for sequencing and subsequent manipulation.  相似文献   

19.
A 4.8 kilobase mouse embryo DNA fragment was inserted into a phage lambda genome and was subsequently characterized by electron microscopy, restriction enzyme mapping and partial DNA sequencing. This fragment contains a 400 base sequence which is homologous to that of an immunoglobulin light lambda chain mRNA which spans 3.3 to 3.7 kilobases from one end of the fragment. Restriction enzyme mapping as well as partial nucleotide sequencing of the 3' terminal of the homology region confirm the previous conclusion [Tonegawa, Brack, Hozumi and Schuller, Proc. Natl. Acad. Sci. USA. 74, 3518-3522 (1977)] that the cloned DNA fragment contains a Vlambda gene sequence which is separate from any Clambda sequence.  相似文献   

20.
In this paper we show that restriction DNA fragments can prime DNA synthesis of a homologous supercoiled plasmid DNA. Using the dideoxyribonucleotide chain terminator method, newly synthesized truncated chains can be detached from the primers by restriction enzyme digestion. Therefore, by choosing DNA fragments flanked by two different restriction enzymes sites, nucleotide sequence information can be simultaneously obtained on both regions of the DNA surrounding the restriction fragment. The advantage of this sequencing approach over current methods is that no prior knowledge of the primary sequence is needed to find the nucleotide sequence of a given DNA fragment. Thus, synthetic primers are not required and internal sequences of a given clone can be easily accessed without the need of fragmenting the original construct. The method has been used with rapid plasmid preparations, thus considerable time and effort can be saved in the gathering of nucleotide sequence information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号