首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
ABSTRACT: BACKGROUND: The molecular recognition based on the complementary base pairing of deoxyribonucleicacid (DNA) is the fundamental principle in the fields of genetics, DNA nanotechnologyand DNA computing. We present an exhaustive DNA sequence design algorithm thatallows to generate sets containing a maximum number of sequences with definedproperties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences) offers thepossibility of controlling both interstrand and intrastrand properties. The guanine-cytosinecontent can be adjusted. Sequences can be forced to start and end with guanine orcytosine. This option reduces the risk of "fraying" of DNA strands. It is possible to limitcross hybridizations of a defined length, and to adjust the uniqueness of sequences.Self-complementarity and hairpin structures of certain length can be avoided. Sequencesand subsequences can optionally be forbidden. Furthermore, sequences can be designed tohave minimum interactions with predefined strands and neighboring sequences. RESULTS: The algorithm is realized in a C++ program. TAG sequences can be generated andcombined with primers for single-base extension reactions, which were described formultiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldbackthrough intrastrand interaction of TAG-primer pairs can be limited. The design ofsequences for specific attachment of molecular constructs to DNA origami is presented. CONCLUSIONS: We developed a new software tool called EGNAS for the design of unique nucleic acidsequences. The presented exhaustive algorithm allows to generate greater sets ofsequences than with previous software and equal constraints. EGNAS is freely availablefor noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.  相似文献   

2.
模式发现是生物信息学的一个重要研究方向,但目前的大部分算法还不能保证获得最优的模式.文章推导了针对三个序列片段相似性关系的判据,将其作为剪枝规则,提出并实现了一种深度优先的穷举搜索算法——判据搜索算法(criterion search algorithm,CRISA),理论分析表明,对绝大多数模式发现问题,CRISA具有多项式的计算时间复杂度和线性的空间复杂度。对仿真的和实际的生物序列数据的测试也表明,CRISA能够快速而完全地识别出序列中所有的模式,具有优于其它算法的总体评价,能够应用于实际的模式发现问题。  相似文献   

3.
A local algorithm for DNA sequence alignment with inversions   总被引:1,自引:0,他引:1  
A dynamic programming algorithm to find all optimal alignments of DNA subsequences is described. The alignments use not only substitutions, insertions and deletions of nucleotides but also inversions (reversed complements) of substrings of the sequences. The inversion alignments themselves contain substitutions, insertions and deletions of nucleotides. We study the problem of alignment with non-intersecting inversions. To provide a computationally efficient algorithm we restrict candidate inversions to theK highest scoring inversions. An algorithm to find theJ best non-intersecting alignments with inversions is also described. The new algorithm is applied to the regions of mitochondrial DNA ofDrosophila yakuba and mouse coding for URF6 and cytochrome b and the inversion of the URF6 gene is found. The open problem of intersecting inversions is discussed.  相似文献   

4.
5.
Two data structures designated Fragment and Construct are described. The Fragment data structure defines a continuous nucleic acid sequence from a unique genetic origin. The Construct defines a continuous sequence composed of sequences from multiple genetic origins. These data structures are manipulated by a set of software tools to simulate the construction of mosaic recombinant DNA molecules. They are also used as an interface between sequence data banks and analytical programs.  相似文献   

6.
Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations.  相似文献   

7.
Myers' elegant and powerful bit-parallel dynamic programming algorithm for approximate string matching has a restriction that the query length should be within the word size of the computer, typically 64. We propose a modification of Myers' algorithm, in which the modification has a restriction not on the query length but on the maximum number of mismatches (substitutions, insertions, or deletions), which should be less than half of the word size. The time complexity is O(m log |Σ|), where m is the query length and |Σ| is the size of the alphabet Σ. Thus, it is particularly suited for sequences on a small alphabet such as DNA sequences. In particular, it is useful in quickly extending a large number of seed alignments against a reference genome for high-throughput short-read data produced by next-generation DNA sequencers.  相似文献   

8.
Monroe WT  Haselton FR 《BioTechniques》2003,34(1):68-70, 72-3
A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.  相似文献   

9.
A computer algorithm has been developed which identifies tRNA genes and tRNA-like structures in DNA sequences. The program searches the sequence string for specific base positions that correspond to the invariant and semi-invariant bases found in tRNAs. The tRNA nature of the sequence is confirmed by the presence of complementary base pairing at the tRNA's calculated 5' and 3' ends (which in situ constitutes the amino-acyl stem region). The program achieves greater than 96% accuracy when run against known tRNA sequences in the Genbank database. The program is modular and is readily modified to allow searching either a file or database. The program is written in "C" and operates on a D.E.C. Vax 750. The utility of the algorithm is demonstrated by the identification of a distinctive tRNA structure in an intron of a published bovine hemoglobin gene.  相似文献   

10.
Code domains in tandem repetitive DNA sequence structures   总被引:6,自引:0,他引:6  
Peter Vogt 《Chromosoma》1992,101(10):585-589
Traditionally, many people doing research in molecular biology attribute coding properties to a given DNA sequence if this sequence contains an open reading frame for translation into a sequence of amino acids. This protein coding capability of DNA was detected about 30 years ago. The underlying genetic code is highly conserved and present in every biological species studied so far. Today, it is obvious that DNA has a much larger coding potential for other important tasks. Apart from coding for specific RNA molecules such as rRNA, snRNA and tRNA molecules, specific structural and sequence patterns of the DNA chain itself express distinct codes for the regulation and expression of its genetic activity. A chromatin code has been defined for phasing of the histone-octamer protein complex in the nucleosome. A translation frame code has been shown to exist that determines correct triplet counting at the ribosome during protein synthesis. A loop code seems to organize the single stranded interaction of the nascent RNA chain with proteins during the splicing process, and a splicing code phases successive 5' and 3' splicing sites. Most of these DNA codes are not exclusively based on the primary DNA sequence itself, but also seem to include specific features of the corresponding higher order structures. Based on the view that these various DNA codes are genetically instructive for specific molecular interactions or processes, important in the nucleus during interphase and during cell division, the coding capability of tandem repetitive DNA sequences has recently been reconsidered.  相似文献   

11.
蚁群遗传算法是在蚁群算法的基础上用遗传算法对其参数进行优化而产生的一种改进算法。把蚁群遗传算法应用于DNA序列比对上,结果表明这种新的序列比对算法是非常有效的。  相似文献   

12.
Quadruplex DNA crystal structures and drug design   总被引:3,自引:0,他引:3  
Neidle S  Parkinson GN 《Biochimie》2008,90(8):1184-1196
Crystallographic studies of G-quadruplex nucleic acids have resulted in a small group of structures to date. Their morphological and detailed conformational features are described here, emphasizing the stability of the G-tetrad core and the flexibility of loops, especially upon ligand binding. Implications for drug design are discussed, in the context of the druggability of both telomeric and non-telomeric quadruplex DNAs.  相似文献   

13.
14.
Representation of sequence similarity by dot matrix plots is a method widely used for comparing biological sequences. The user is presented with an overall view of similarity between two sequences. Computation of this plot has been reconsidered here. An improvement is proposed through the preprocessing of the data into an automation recognizing the word structure of a sequence. The main advantage of this approach is to systematically eliminate the repetitions during word comparison. Simple heuristics are also considered to greatly speed up pattern matching. As a result, large sequences are handled very efficiently. This is illustrated by a comparison of large genomic DNA. The algorithm has been implemented in an interactive application on a microcomputer.  相似文献   

15.
DNA lesions such as crosslinks represent obstacles for the replication machinery. Nonetheless, replication can proceed via the DNA damage tolerance pathway also known as postreplicative repair pathway. SNF2 ATPase Rad5 homologs, such as RAD5A of the model plant Arabidopsis thaliana, are important for the error‐free mode of this pathway. We able to demonstrate before, that RAD5A is a key factor in the repair of DNA crosslinks in Arabidopsis. Here, we show by in vitro analysis that AtRAD5A protein is a DNA translocase able to catalyse fork regression. Interestingly, replication forks with a gap in the leading strand are processed best, in line with its suggested function. Furthermore AtRAD5A catalyses branch migration of a Holliday junction and is furthermore not impaired by the DNA binding of a model protein, which is indicative of its ability to displace other proteins. Rad5 homologs possess HIRAN (Hip116, Rad5; N‐terminal) domains. By biochemical analysis we were able to demonstrate that the HIRAN domain variant from Arabidopsis RAD5A mediates structure selective DNA binding without the necessity for a free 3′OH group as has been shown to be required for binding of HIRAN domains in a mammalian RAD5 homolog. The biological importance of the HIRAN domain in AtRAD5A is demonstrated by our result that it is required for its function in DNA crosslink repair in vivo.  相似文献   

16.
The ability to infer relationships between groups of sequences, either by searching for their evolutionary history or by comparing their sequence similarity, can be a crucial step in hypothesis testing. Interpreting relationships of human immunodeficiency virus type 1 (HIV-1) sequences can be challenging because of their rapidly evolving genomes, but it may also lead to a better understanding of the underlying biology. Several studies have focused on the evolution of HIV-1, but there is little information to link sequence similarities and evolutionary histories of HIV-1 to the epidemiological information of the infected individual. Our goal was to correlate patterns of HIV-1 genetic diversity with epidemiological information, including risk and demographic factors. These correlations were then used to predict epidemiological information through analyzing short stretches of HIV-1 sequence. Using standard phylogenetic and phenetic techniques on 100 HIV-1 subtype B sequences, we were able to show some correlation between the viral sequences and the geographic area of infection and the risk of men who engage in sex with men. To help identify more subtle relationships between the viral sequences, the method of multidimensional scaling (MDS) was performed. That method identified statistically significant correlations between the viral sequences and the risk factors of men who engage in sex with men and individuals who engage in sex with injection drug users or use injection drugs themselves. Using tree construction, MDS, and newly developed likelihood assignment methods on the original 100 samples we sequenced, and also on a set of blinded samples, we were able to predict demographic/risk group membership at a rate statistically better than by chance alone. Such methods may make it possible to identify viral variants belonging to specific demographic groups by examining only a small portion of the HIV-1 genome. Such predictions of demographic epidemiology based on sequence information may become valuable in assigning different treatment regimens to infected individuals.  相似文献   

17.
Claspin is an essential protein for the ATR-dependent activation of the DNA replication checkpoint response in Xenopus and human cells. Here we describe the purification and characterization of human Claspin. The protein has a ring-like structure and binds with high affinity to branched DNA molecules. These findings suggest that Claspin may be a component of the replication ensemble and plays a role in the replication checkpoint by directly associating with replication forks and with the various branched DNA structures likely to form at stalled replication forks because of DNA damage.  相似文献   

18.
MOTIVATION: The sensitivity and specificity of branched DNA (bDNA) assays are derived in part through the judicious design of the capture and label extender probes. To minimize non-specific hybridization (NSH) events, which elevate assay background, candidate probes must be computer screened for complementarity with generic sequences present in the assay. RESULTS: We present a software application which allows for rapid and flexible design of bDNA probesets for novel targets. It includes an algorithm for estimating the magnitude of NSH contribution to background, a mechanism for removing probes with elevated contributions, a methodology for the simultaneous design of probesets for multiple targets, and a graphical user interface which guides the user through the design steps. AVAILABILITY: The program is available as a commercial package through the Pharmaceutical Drug Discovery program at Chiron Diagnostics.  相似文献   

19.
Gene 3 endonuclease of bacteriophage T7 has been expressed from the cloned gene, purified, and characterized as to its activity on different DNA substrates. Besides its known strong preference for cutting single-stranded DNA rather than double-stranded DNA, the enzyme has a strong preference for cutting conformationally branched structures in double-stranded DNA, either X or Y-shaped branches. Three types of branched DNA substrates were used: relaxed circular DNAs containing large cruciform structures (a model for Holliday structures, presumed intermediates in genetic recombination); X-shaped molecules having a limited potential for branch migration, made from the cloned phage and bacterial arms of the lambda attachment site; and Y-shaped molecules, made by hybridizing molecules homologous except for a 2 X 21 base-pair palindrome in one of them. Gene 3 endonuclease cuts two opposing strands at or near the branchpoint to resolve these substrates into linear molecules, and does not cut the potentially single-stranded tips of the stem-and-loop structure generated from the palindrome. The position of the cleavage points on the equivalent arm of two X-shaped molecules, constructed from wild-type and mutant lambda attachment sites, show that the enzyme can cut at several different sites within or slightly 5' of the limited region of branch migration. The various activities of gene 3 endonuclease are consistent with the known role of this enzyme in genetic recombination, in maturation and packaging of T7 DNA, and in degradation of host DNA, and suggest that the enzyme recognizes a specific structural feature in DNA. Its cleavage specificity, ready availability, and ability to act at physiological pH and ionic conditions may make gene 3 endonuclease useful as a probe for specific DNA structures or for binding of proteins that alter DNA structure.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号