首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT: BACKGROUND: The molecular recognition based on the complementary base pairing of deoxyribonucleicacid (DNA) is the fundamental principle in the fields of genetics, DNA nanotechnologyand DNA computing. We present an exhaustive DNA sequence design algorithm thatallows to generate sets containing a maximum number of sequences with definedproperties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences) offers thepossibility of controlling both interstrand and intrastrand properties. The guanine-cytosinecontent can be adjusted. Sequences can be forced to start and end with guanine orcytosine. This option reduces the risk of "fraying" of DNA strands. It is possible to limitcross hybridizations of a defined length, and to adjust the uniqueness of sequences.Self-complementarity and hairpin structures of certain length can be avoided. Sequencesand subsequences can optionally be forbidden. Furthermore, sequences can be designed tohave minimum interactions with predefined strands and neighboring sequences. RESULTS: The algorithm is realized in a C++ program. TAG sequences can be generated andcombined with primers for single-base extension reactions, which were described formultiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldbackthrough intrastrand interaction of TAG-primer pairs can be limited. The design ofsequences for specific attachment of molecular constructs to DNA origami is presented. CONCLUSIONS: We developed a new software tool called EGNAS for the design of unique nucleic acidsequences. The presented exhaustive algorithm allows to generate greater sets ofsequences than with previous software and equal constraints. EGNAS is freely availablefor noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.  相似文献   

2.
A local algorithm for DNA sequence alignment with inversions   总被引:1,自引:0,他引:1  
A dynamic programming algorithm to find all optimal alignments of DNA subsequences is described. The alignments use not only substitutions, insertions and deletions of nucleotides but also inversions (reversed complements) of substrings of the sequences. The inversion alignments themselves contain substitutions, insertions and deletions of nucleotides. We study the problem of alignment with non-intersecting inversions. To provide a computationally efficient algorithm we restrict candidate inversions to theK highest scoring inversions. An algorithm to find theJ best non-intersecting alignments with inversions is also described. The new algorithm is applied to the regions of mitochondrial DNA ofDrosophila yakuba and mouse coding for URF6 and cytochrome b and the inversion of the URF6 gene is found. The open problem of intersecting inversions is discussed.  相似文献   

3.
4.
Myers' elegant and powerful bit-parallel dynamic programming algorithm for approximate string matching has a restriction that the query length should be within the word size of the computer, typically 64. We propose a modification of Myers' algorithm, in which the modification has a restriction not on the query length but on the maximum number of mismatches (substitutions, insertions, or deletions), which should be less than half of the word size. The time complexity is O(m log |Σ|), where m is the query length and |Σ| is the size of the alphabet Σ. Thus, it is particularly suited for sequences on a small alphabet such as DNA sequences. In particular, it is useful in quickly extending a large number of seed alignments against a reference genome for high-throughput short-read data produced by next-generation DNA sequencers.  相似文献   

5.
Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations.  相似文献   

6.
Two data structures designated Fragment and Construct are described. The Fragment data structure defines a continuous nucleic acid sequence from a unique genetic origin. The Construct defines a continuous sequence composed of sequences from multiple genetic origins. These data structures are manipulated by a set of software tools to simulate the construction of mosaic recombinant DNA molecules. They are also used as an interface between sequence data banks and analytical programs.  相似文献   

7.
Monroe WT  Haselton FR 《BioTechniques》2003,34(1):68-70, 72-3
A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.  相似文献   

8.
A computer algorithm has been developed which identifies tRNA genes and tRNA-like structures in DNA sequences. The program searches the sequence string for specific base positions that correspond to the invariant and semi-invariant bases found in tRNAs. The tRNA nature of the sequence is confirmed by the presence of complementary base pairing at the tRNA's calculated 5' and 3' ends (which in situ constitutes the amino-acyl stem region). The program achieves greater than 96% accuracy when run against known tRNA sequences in the Genbank database. The program is modular and is readily modified to allow searching either a file or database. The program is written in "C" and operates on a D.E.C. Vax 750. The utility of the algorithm is demonstrated by the identification of a distinctive tRNA structure in an intron of a published bovine hemoglobin gene.  相似文献   

9.
Code domains in tandem repetitive DNA sequence structures   总被引:6,自引:0,他引:6  
Peter Vogt 《Chromosoma》1992,101(10):585-589
Traditionally, many people doing research in molecular biology attribute coding properties to a given DNA sequence if this sequence contains an open reading frame for translation into a sequence of amino acids. This protein coding capability of DNA was detected about 30 years ago. The underlying genetic code is highly conserved and present in every biological species studied so far. Today, it is obvious that DNA has a much larger coding potential for other important tasks. Apart from coding for specific RNA molecules such as rRNA, snRNA and tRNA molecules, specific structural and sequence patterns of the DNA chain itself express distinct codes for the regulation and expression of its genetic activity. A chromatin code has been defined for phasing of the histone-octamer protein complex in the nucleosome. A translation frame code has been shown to exist that determines correct triplet counting at the ribosome during protein synthesis. A loop code seems to organize the single stranded interaction of the nascent RNA chain with proteins during the splicing process, and a splicing code phases successive 5' and 3' splicing sites. Most of these DNA codes are not exclusively based on the primary DNA sequence itself, but also seem to include specific features of the corresponding higher order structures. Based on the view that these various DNA codes are genetically instructive for specific molecular interactions or processes, important in the nucleus during interphase and during cell division, the coding capability of tandem repetitive DNA sequences has recently been reconsidered.  相似文献   

10.
Quadruplex DNA crystal structures and drug design   总被引:3,自引:0,他引:3  
Neidle S  Parkinson GN 《Biochimie》2008,90(8):1184-1196
Crystallographic studies of G-quadruplex nucleic acids have resulted in a small group of structures to date. Their morphological and detailed conformational features are described here, emphasizing the stability of the G-tetrad core and the flexibility of loops, especially upon ligand binding. Implications for drug design are discussed, in the context of the druggability of both telomeric and non-telomeric quadruplex DNAs.  相似文献   

11.
12.
Representation of sequence similarity by dot matrix plots is a method widely used for comparing biological sequences. The user is presented with an overall view of similarity between two sequences. Computation of this plot has been reconsidered here. An improvement is proposed through the preprocessing of the data into an automation recognizing the word structure of a sequence. The main advantage of this approach is to systematically eliminate the repetitions during word comparison. Simple heuristics are also considered to greatly speed up pattern matching. As a result, large sequences are handled very efficiently. This is illustrated by a comparison of large genomic DNA. The algorithm has been implemented in an interactive application on a microcomputer.  相似文献   

13.
14.
The ability to infer relationships between groups of sequences, either by searching for their evolutionary history or by comparing their sequence similarity, can be a crucial step in hypothesis testing. Interpreting relationships of human immunodeficiency virus type 1 (HIV-1) sequences can be challenging because of their rapidly evolving genomes, but it may also lead to a better understanding of the underlying biology. Several studies have focused on the evolution of HIV-1, but there is little information to link sequence similarities and evolutionary histories of HIV-1 to the epidemiological information of the infected individual. Our goal was to correlate patterns of HIV-1 genetic diversity with epidemiological information, including risk and demographic factors. These correlations were then used to predict epidemiological information through analyzing short stretches of HIV-1 sequence. Using standard phylogenetic and phenetic techniques on 100 HIV-1 subtype B sequences, we were able to show some correlation between the viral sequences and the geographic area of infection and the risk of men who engage in sex with men. To help identify more subtle relationships between the viral sequences, the method of multidimensional scaling (MDS) was performed. That method identified statistically significant correlations between the viral sequences and the risk factors of men who engage in sex with men and individuals who engage in sex with injection drug users or use injection drugs themselves. Using tree construction, MDS, and newly developed likelihood assignment methods on the original 100 samples we sequenced, and also on a set of blinded samples, we were able to predict demographic/risk group membership at a rate statistically better than by chance alone. Such methods may make it possible to identify viral variants belonging to specific demographic groups by examining only a small portion of the HIV-1 genome. Such predictions of demographic epidemiology based on sequence information may become valuable in assigning different treatment regimens to infected individuals.  相似文献   

15.
MOTIVATION: The sensitivity and specificity of branched DNA (bDNA) assays are derived in part through the judicious design of the capture and label extender probes. To minimize non-specific hybridization (NSH) events, which elevate assay background, candidate probes must be computer screened for complementarity with generic sequences present in the assay. RESULTS: We present a software application which allows for rapid and flexible design of bDNA probesets for novel targets. It includes an algorithm for estimating the magnitude of NSH contribution to background, a mechanism for removing probes with elevated contributions, a methodology for the simultaneous design of probesets for multiple targets, and a graphical user interface which guides the user through the design steps. AVAILABILITY: The program is available as a commercial package through the Pharmaceutical Drug Discovery program at Chiron Diagnostics.  相似文献   

16.
This article presents a new graph-based algorithm for identifying branched metabolic pathways in multi-genome scale metabolic data. The term branched is used to refer to metabolic pathways between compounds that consist of multiple pathways that interact biochemically. A branched pathway may produce a target compound through a combination of linear pathways that split compounds into smaller ones, work in parallel with many compounds, and join compounds into larger ones. While branched metabolic pathways predominate in metabolic networks, most previous work has focused on identifying linear metabolic pathways. The ability to automatically identify branched pathways is important in applications that require a deeper understanding of metabolism, such as metabolic engineering and drug target identification. The algorithm presented in this article utilizes explicit atom tracking to identify linear metabolic pathways and then merges them together into branched metabolic pathways. We provide results on several well-characterized metabolic pathways that demonstrate that the new merging approach can efficiently find biologically relevant branched metabolic pathways.  相似文献   

17.
Claspin is an essential protein for the ATR-dependent activation of the DNA replication checkpoint response in Xenopus and human cells. Here we describe the purification and characterization of human Claspin. The protein has a ring-like structure and binds with high affinity to branched DNA molecules. These findings suggest that Claspin may be a component of the replication ensemble and plays a role in the replication checkpoint by directly associating with replication forks and with the various branched DNA structures likely to form at stalled replication forks because of DNA damage.  相似文献   

18.
19.
MOTIVATION: Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in (gamman2) memory, where gamma is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences only. RESULTS: We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only O(alphan) space, where alpha is the sum of the lengths of constraints and usually alpha < n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. AVAILABILITY: http://genome.life.nctu.edu.tw/MUSICME.  相似文献   

20.
Local variability and base sequence effects in DNA crystal structures   总被引:1,自引:0,他引:1  
The importance and usefulness of local doublet parameters in understanding sequence dependent effects has been described for A- and B-DNA oligonucleotide crystal structures. Each of the two sets of local parameters described by us in the NUPARM algorithm, namely the local doublet parameters, calculated with reference to the mean z-axis, and the local helical parameters, calculated with reference to the local helix axis, is sufficient to describe the oligonucleotide structures, with the local helical parameters giving a slightly magnified picture of the variations in the structures. The values of local doublet parameters calculated by NUPARM algorithm are similar to those calculated by NEWHELIX90 program, only if the oligonucleotide fragment is not too distorted. The mean values obtained using all the available data for B-DNA crystals are not significantly different from those obtained when a limited data set is used, consisting only of structures with a data resolution of better than 2.4 A and without any bound drug molecule. Thus the variation observed in the oligonucleotide crystals appears to be independent of the quality of their crystallinity. No strong correlation is seen between any pair of local doublet parameters but the local helical parameters are interrelated by geometric relationships. An interesting feature that emerges from this analysis is that the local rise along the z-axis is highly correlated with the difference in the buckle values of the two basepairs in the doublet, as suggested earlier for the dodecamer structures (Bansal and Bhattacharyya, in Structure & Methods: DNA & RNA, Vol. 3 (Eds., R.H. Sarma and M.H. Sarma), pp. 139-153 (1990)). In fact the local rise values become almost constant for both A- and B-forms, if a correction is applied for the buckling of the basepairs. In B-DNA the AA, AT, TA and GA basepair sequences generally have a smaller local rise (3.25 A) compared to the other sequences (3.4 A) and this seems to be an intrinsic feature of basepair stacking interaction and not related to any other local doublet parameter. The roll angles in B-DNA oligonucleotides have small values (less than +/- 8 degrees), while mean local twist varies from 24 degrees to 45 degrees. The CA/TG doublet sequences show two types of preferred geometries, one with positive roll, small positive slide and reduced twist and another with negative roll, large positive slide and increased twist.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号