共查询到20条相似文献,搜索用时 0 毫秒
1.
Assembling millions of short DNA sequences using SSAKE 总被引:7,自引:0,他引:7
Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: http://www.bcgsc.ca/bioinfo/software/ssake. 相似文献
2.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome 总被引:20,自引:0,他引:20
Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the
human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint
of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking
algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds.
Bowtie is open source . 相似文献
3.
The purpose of this work is to determine the most frequent short sequences in non-coding DNA. They may play a role in maintaining the structure and function of eukaryotic chromosomes. We present a simple method for the detection and analysis of such sequences in several genomes, including Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. We also study two chromosomes of man and mouse with a length similar to the whole genomes of the other species. We provide a list of the most common sequences of 9–14 bases in each genome. As expected, they are present in human Alu sequences. Our programs may also give a graph and a list of their position in the genome. Detection of clusters is also possible. In most cases, these sequences contain few alternating regions. Their intrinsic structure and their influence on nucleosome formation are not known. In particular, we have found new features of short sequences in C. elegans, which are distributed in heterogeneous clusters. They appear as punctuation marks in the chromosomes. Such clusters are not found in either A. thaliana or D. melanogaster. We discuss the possibility that they play a role in centromere function and homolog recognition in meiosis. 相似文献
4.
A collection of user-interactive computer programs is described which aid in the assembly of DNA sequences. This is achieved by searching for the positions of overlapping common nucleotide sequences within the blocks of sequence obtained as primary data. Such overlapping segments are then melded into one continuous string of nucleotides. Strategies for determining the accuracy of the sequence being analyzed and reducing the error rate resulting from the manual manipulation of sequence data are discussed. Sequences mapping from 97.3 to 100% of the Ad2 virus genome were used to demonstrate the performance of these programs. 相似文献
5.
Deanna M Church Valerie A Schneider Karyn Meltz Steinberg Michael C Schatz Aaron R Quinlan Chen-Shan Chin Paul A Kitts Bronwen Aken Gabor T Marth Michael M Hoffman Javier Herrero M Lisandra Zepeda Mendoza Richard Durbin Paul Flicek 《Genome biology》2015,16(1)
The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required. 相似文献
6.
The ability of peptide nucleic acid (PNA) to open up duplex DNA in a highly sequence-specific manner makes it possible to detect short DNA sequences on the background of or within genomic DNA under non-denaturing conditions. To do so, chosen marker sites in double-stranded DNA are locally opened by a pair of PNA openers, thus transforming one strand within the target region (20-30 bp) into the single-stranded form. Onto this accessible DNA sequence a circular oligonucleotide probe is assembled, which serves as a template for rolling circle amplification (RCA). Both homogeneous and heterogeneous assay formats are investigated, as are different formats for fluorescence-based amplicon detection. Our recent data with immobilized analytes suggest that marker sequences in plasmid and bacterial chromosomal DNA can be successfully detected. 相似文献
7.
The genetic distance between two DNA sequences may be measured by the average number of nucleotide substitutions per position that has occurred since the two sequences diverged from a common ancestor. Estimates of this quantity can be derived from Markov models for the substitution process, while the variances are estimated using the delta method and confidence intervals calculated assuming normality. However, when the sampling distribution of the estimator deviates from normality, such intervals will not be accurate. For simple one-parameter models of nucleotide substitution, we propose a transformation of normal confidence intervals, which yields an almost exact approximation to the true confidence intervals of the distance estimators. To calculate confidence intervals for more complicated models, we propose the saddlepoint approximation. A simulation study shows that the saddlepoint-derived confidence intervals are a real improvement over existing methods. 相似文献
8.
Nonlinearity is important and ubiquitous in ecology. Though detectable in principle, nonlinear behavior is often difficult to characterize, analyze, and incorporate mechanistically into models of ecosystem function. One obvious reason is that quantitative nonlinear analysis tools are data intensive (require long time series), and time series in ecology are generally short. Here we demonstrate a useful method that circumvents data limitation and reduces sampling error by combining ecologically similar multispecies time series into one long time series. With this technique, individual ecological time series containing as few as 20 data points can be mined for such important information as (1) significantly improved forecast ability, (2) the presence and location of nonlinearity, and (3) the effective dimensionality (the number of relevant variables) of an ecological system. 相似文献
9.
10.
V. S. Mikhailov V. K. Potapov R. N. Amirkhanov N. V. Amirkhanov S. S. Bulanenkova S. B. Akopov V. F. Zarytova L. G. Nikolaev E. D. Sverdlov 《Russian Journal of Bioorganic Chemistry》2013,39(1):72-76
The ability of short peptide nucleic acid (PNA) oligomers and oligonucleotides containing modified residues of 5-methylcitidine, 2-aminoadenosine, and 5-propynyl-2′-deoxyuridine (strong binding oligonucleotides, SBO) to affinity capture the target double-stranded DNA fragment from mixture by means of the end invasion was compared. Both types of probes were highly effective at the conditions used. The SBO-based probes may represent a handy and easily prepared alternative to PNA for selection of target DNA fragments in mixtures. 相似文献
11.
Tandemly polymerized regulatory elements, antisense RNA segments or ribozymes are potentially useful in selective gene silencing. However, existing methods of tandemly polymerizing short DNA segments are laborious. We present a procedure that can create cloned arrays of 40-70 monomer units in two steps. We have created long arrays of regulatory elements and potential ribozyme sequences. Silencing of human immunodeficiency virus (HIV-1) activation by tandem arrays of a regulatory element in human immune system cells and in other human and monkey cells is discussed. 相似文献
12.
Short single-copy probes have been widely used in plant molecular biology. However, they have rarely been effective in plant
research usingin situ hybridization techniques, possibly due to limitations imposed by the cell wall. We recently developed two fluorescencein situ hybridization protocols for the single-copy sequence detection in soybean. By enzymatically removing the cell wall, single-copy
sequences as short as 1 kb were detected by probes using standard fluorescencein situ hybridization or PCR-primedin situ hybridization (PCR-PRINS). Such technology is useful for genome analysis, in plant molecular, cellular, and biotechnological
research. 相似文献
13.
We have developed a protocol for rapid sequencing of short DNA stretches (15–20 nt) using MALDI-TOF-MS. The protocol is based on the Sanger concept with the modification that double-stranded template DNA is used and all four sequencing reactions are performed in one reaction vial. The sequencing products are separated and detected by MALDI-TOF-MS and the sequence is determined by comparing measured molecular mass differences to expected values. The protocol is optimized for low costs and broad applicability. One reaction typically includes 300 fmol template, 10 pmol primer and 200 pmol each nucleotide monomer. Neither the primer nor any of the nucleotide monomers are labeled. Solid phase purification, concentration and mass spectrometric sample preparation of the sequencing products are accomplished in a few minutes and parallel processing of 96 samples is possible. The mass spectrometric analyses and subsequent sequence read-out require only a few seconds per template. 相似文献
14.
Single-molecule techniques for stretching DNA of contour lengths less than a kilobase are fraught with experimental difficulties. However, many interesting biological events such as histone binding and protein-mediated looping of DNA, occur on this length scale. In recent years, the mechanical properties of DNA have been shown to play a significant role in fundamental cellular processes like the packaging of DNA into compact nucleosomes and chromatin fibers. Clearly, it is then important to understand the mechanical properties of short stretches of DNA. In this paper, we provide a practical guide to a single-molecule optical tweezing technique that we have developed to study the mechanical behavior of DNA with contour lengths as short as a few hundred basepairs. The major hurdle in stretching short segments of DNA is that conventional optical tweezers are generally designed to apply force in a direction lateral to the stage (see Fig. 1). In this geometry, the angle between the bead and the coverslip, to which the DNA is tethered, becomes very steep for submicron length DNA. The axial position must now be accounted for, which can be a challenge, and, since the extension drags the microsphere closer to the coverslip, steric effects are enhanced. Furthermore, as a result of the asymmetry of the microspheres, lateral extensions will generate varying levels of torque due to rotation of the microsphere within the optical trap since the direction of the reactive force changes during the extension. Alternate methods for stretching submicron DNA run up against their own unique hurdles. For instance, a dual-beam optical trap is limited to stretching DNA of around a wavelength, at which point interference effects between the two traps and from light scattering between the microspheres begin to pose a significant problem. Replacing one of the traps with a micropipette would most likely suffer from similar challenges. While one could directly use the axial potential to stretch the DNA, an active feedback scheme would be needed to apply a constant force and the bandwidth of this will be quite limited, especially at low forces. We circumvent these fundamental problems by directly pulling the DNA away from the coverslip by using a constant force axial optical tweezers. This is achieved by trapping the bead in a linear region of the optical potential, where the optical force is constant-the strength of which can be tuned by adjusting the laser power. Trapping within the linear region also serves as an all optical force-clamp on the DNA that extends for nearly 350 nm in the axial direction. We simultaneously compensate for thermal and mechanical drift by finely adjusting the position of the stage so that a reference microsphere stuck to the coverslip remains at the same position and focus, allowing for a virtually limitless observation period. 相似文献
15.
MOTIVATION: The overall performance of several molecular biology techniques involving DNA/DNA hybridization depends on the accurate prediction of the experimental value of a critical parameter: the melting temperature Tm. Till date, many computer software programs based on different methods and/or parameterizations are available for the theoretical estimation of the experimental Tm value of any given short oligonucleotide sequence. However, in most cases, large and significant differences in the estimations of Tm were obtained while using different methods. Thus, it is difficult to decide which Tm value is the accurate one. In addition, it seems that most people who use these methods are unaware about the limitations, which are well described in the literature but not stated properly or restricted the inputs of most of the web servers and standalone software programs that implement them. RESULTS: A quantitative comparison on the similarities and differences among some of the published DNA/DNA Tm calculation methods is reported. The comparison was carried out for a large set of short oligonucleotide sequences ranging from 16 to 30 nt long, which span the whole range of CG-content. The results showed that significant differences were observed in all the methods, which in some cases depend on the oligonucleotide length and CG-content in a non-trivial manner. Based on these results, the regions of consensus and disagreement for the methods in the oligonucleotide feature space were reported. Owing to the lack of sufficient experimental data, a fair and complete assessment of accuracy for the different methods is not yet possible. Inspite of this limitation, a consensus Tm with minimal error probability was calculated by averaging the values obtained from two or more methods that exhibit similar behavior to each particular combination of oligonucleotide length and CG-content class. Using a total of 348 DNA sequences in the size range between 16mer and 30mer, for which the experimental Tm data are available, we demonstrated that the consensus Tm is a robust and accurate measure. It is expected that the results of this work would be constituted as a useful set of guidelines to be followed for the successful experimental implementation of various molecular biology techniques, such as quantitative PCR, multiplex PCR and the design of optimal DNA microarrays. 相似文献
16.
Dmitry Meleshko Rui Yang Patrick Marks Stephen Williams Iman Hajirasouliha 《Nucleic acids research》2022,50(18):e108
Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion’s share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact. 相似文献
17.
We have discovered that short guanine-rich oligonucleotides are able to self-associate into higher order structures that stimulate DNA synthesis in vitro without the addition of a conventional template [Ying, J., Bradley, R. K., Jones, L. B., Reddy, M. S., Colbert, D. T., Smalley, R. E., and Hardin, S. H. (1999) Biochemistry 38, 16461-16468]. Our initial analysis indicated the importance of the presence of three contiguous guanines (G) in an oligonucleotide that stimulates DNA polymerization. To gain insight into and to refine sequence requirements for the unexpected DNA synthesis, we analyzed a 231-member guanine-rich octamer library in a fluorescent nucleotide polymerization assay. We observe that, in addition to three contiguous Gs, the presence of a secondary G cluster within the octamer is essential. Furthermore, the location of the primary G cluster in the center of the molecule is most stimulatory. The majority of the octamers that form extended DNA products have a single non-G base separating the primary and secondary G clusters, the identity of which is predominantly thymine (T). Further, a T 5' or 3' of the primary G cluster positively influences the stimulatory function of the oligonucleotide. Overall, the occurrence of bases in the octamer is in the descending order of G > T > A > C. Our studies demonstrate that structures stabilized by noncanonical base pairings are recognized by a DNA polymerase in vitro, and these findings may have relevance within the cell. In particular, the features of these G-rich stimulatory sequences show striking similarities to telomeric sequences that form diverse G-quartet structures in vitro. 相似文献
18.
19.
SUMMARY: Many biological papers describe short, functional DNA sites without specifying their exact positions in the genome. We have developed a Web server that automates the tedious task of locating such sites in eukaryotic genomes, thus giving access to the context of rich annotations that are increasingly available for genome sequences. AVAILABILITY: http://zlab.bu.edu/site2genome/ 相似文献
20.
A Monte Carlo method has been developed for generating the conformations of short single-stranded DNAs from arbitrary starting states. The chain conformers are constructed from energetically favorable arrangements of the constituent mononucleotides. Minimum energy states of individual dinucleotide monophosphate molecules are identified using a torsion angle minimizer. The glycosyl and acyclic backbone torsions of the dimers are allowed to vary, while the sugar rings are held fixed in one of the two preferred puckered forms. A total of 108 conformationally distinct states per dimer are considered in this first stage of minimization. The torsion angles within 5 kcal/mole of the global minimum in the resulting optimized states are then allowed to vary by ±10° in an effort to estimate the breadth of the different local minima. The energies of a total of 2187 (37) angle combinations are examined per local conformational minimum. Finally, the energies of all dinucleotide conformers are scaled so that the populations of differently puckered sugar rings in the theoretical sample match those found in nmr solution studies. This last step is necessitated by limitations in the theoretical methods to predict DNA sugar puckering accurately. The conformer populations of the individual acyclic torsion angles in the composite dimer ensembles are found to be in good agreement with the distributions of backbone conformations deduced from nmr coupling constants and the frequencies of glycosyl conformations in x-ray crystal structures, suggesting that the low energy states are reasonable. The low energy dimer forms (consisting of 150–325 conformational states per dimer step) are next used as variables in a Monte Carlo algorithm, which generates the conformations of single-stranded d(CXnG) chains, where X = A, T and n = 3, 4, 5. The oligonucleotides are built sequentially from the 5′ end of the chain using random numbers to select the conformations of overlapping dimer units. The simulations are very fast, involving a total of 106 conformations per chain sequence. The potential errors in the buildup procedure are minimized by taking advantage of known rotational interdependences in the sugar–phosphate backbone. The distributions of oligonucleotide conformations are examined in terms of the magnitudes, positions, and orientations of the end-to-end vectors of the chains. The differences in overall flexibility and extension of the oligomers are discussed in terms of the conformations of the constituent dinucleotide steps, while the general methodology is discussed and compared with other nucleic acid model building techniques. © 1993 John Wiley & Sons, Inc. 相似文献