首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.  相似文献   

2.
Production of various structures by self-assembling single stranded DNA molecules is a widely used technology in the filed of DNA nanotechnology. Base sequences of single strands do predict the shape of the resulting nanostructure. Therefore, sequence design is crucial for the successful structure fabrication. This paper presents a sequence design algorithm based on mismatch minimization that can be applied to every desired DNA structure. With this algorithm, junctions, loops, single as well as double stranded regions, and very large structures up to several thousand base pairs can be handled. Thereby, the algorithm is fast for the most structures. Algorithm is Java-implemented. Its implementation is called Seed and is available publicly. As an example for a successful sequence generation, this paper presents the fabrication of DNA chain molecules consisting of double-crossover (DX) tiles as well.  相似文献   

3.
A new computer search strategy has been devised for high-resolutionnucleotide sequence analysis. The strategy differs from thoseused by earlier sequence analysing programs in that it is exhaustiveand capable of detecting all possible homologies and other typesof relationships between or within sequences irrespective ofthe pattern of matches and mismatches encountered. The implementationof this strategy into a working algorithm is described. Received on March 1, 1985; accepted on April 24, 1985  相似文献   

4.
Monroe WT  Haselton FR 《BioTechniques》2003,34(1):68-70, 72-3
A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.  相似文献   

5.
蚁群遗传算法是在蚁群算法的基础上用遗传算法对其参数进行优化而产生的一种改进算法。把蚁群遗传算法应用于DNA序列比对上,结果表明这种新的序列比对算法是非常有效的。  相似文献   

6.
Two tris-benzimidazole derivatives have been designed and synthesized based on the known structures of the bis-benzimidazole stain Hoechst 33258 complexed to short oligonucleotide duplexes derived from single crystal X-ray studies and from NMR. In both derivatives the phenol group has been replaced by a methoxy-phenyl substituent. Whereas one tris-benzimidazole carries a N-methyl-piperazine at the 6-position, the other one has this group replaced by a 2-amino-pyrrolidine ring. This latter substituent results in stronger DNA binding. The optimized synthesis of the drugs is described. The two tris-benzimidazoles exhibit high AT-base pair (bp) selectivity evident in footprinting experiments which show that five to six base pairs are protected by the tris-benzimidazoles as compared to four to five protected by the bis-benzimidazoles. The tris-benzimidazoles bind well to sequences like 5'-TAAAC, 5'-TTTAC and 5'-TTTAT, but it is also evident that they can bind weakly to sequences such as 5'-TATGTT-3' where the continuity of an AT stretch is interrupted by a single G*C base pair.  相似文献   

7.
模式发现是生物信息学的一个重要研究方向,但目前的大部分算法还不能保证获得最优的模式.文章推导了针对三个序列片段相似性关系的判据,将其作为剪枝规则,提出并实现了一种深度优先的穷举搜索算法——判据搜索算法(criterion search algorithm,CRISA),理论分析表明,对绝大多数模式发现问题,CRISA具有多项式的计算时间复杂度和线性的空间复杂度。对仿真的和实际的生物序列数据的测试也表明,CRISA能够快速而完全地识别出序列中所有的模式,具有优于其它算法的总体评价,能够应用于实际的模式发现问题。  相似文献   

8.
The undertaking of large-scale DNA sequencing screens for somatic variants in human cancers requires accurate and rapid processing of traces for variants. Due to their often aneuploid nature and admixed normal tissue, heterozygous variants found in primary cancers are often subtle and difficult to detect. To address these issues, we have developed a mutation detection algorithm, AutoCSA, specifically optimized for the high throughput screening of cancer samples. Availability: http://www.sanger.ac.uk/genetics/CGP/Software/AutoCSA.  相似文献   

9.
A local algorithm for DNA sequence alignment with inversions   总被引:1,自引:0,他引:1  
A dynamic programming algorithm to find all optimal alignments of DNA subsequences is described. The alignments use not only substitutions, insertions and deletions of nucleotides but also inversions (reversed complements) of substrings of the sequences. The inversion alignments themselves contain substitutions, insertions and deletions of nucleotides. We study the problem of alignment with non-intersecting inversions. To provide a computationally efficient algorithm we restrict candidate inversions to theK highest scoring inversions. An algorithm to find theJ best non-intersecting alignments with inversions is also described. The new algorithm is applied to the regions of mitochondrial DNA ofDrosophila yakuba and mouse coding for URF6 and cytochrome b and the inversion of the URF6 gene is found. The open problem of intersecting inversions is discussed.  相似文献   

10.
C. Lee  X. Li  E. W. Jabs  D. Court  C. C. Lin 《Chromosoma》1995,104(2):103-112
The cosmid clone, CX16-2D12, was previously localized to the centromeric region of the human X chromosome and shown to lack human X-specific satellite DNA. A 1.2 kb EcoRI fragment was subcloned from the CX16-2D12 cosmid and was named 2D12/E2. DNA sequencing revealed that this 1,205 bp fragment consisted of approximately five tandemly repeated DNA monomers of 220 bp. DNA sequence homology between the monomers of 2D12/E2 ranged from 72.8% to 78.6%. Interestingly, DNA sequence analysis of the 2D12/E2 clone displayed a change in monomer unit orientation between nucleotide positions 585–586 from a tail-to-head arrangement to a head-to-tail configuration. This may reflect the existence of at least one inversion within this repetitive DNA array in the centromeric region of the human X chromosome. The DNA consensus sequence derived from a compilation of these 220 bp monomers had approximately 62% DNA sequence similarity to the previously determined 8 satellite DNA consensus sequence. Comparison of the 2D12/E2 and 8 consensus sequences revealed a 20 bp DNA sequence that was well conserved in both DNA consensus sequences. Slot-blot analysis revealed that this repetitive DNA sequence comprises approximately 0.015% of the human genome, similar to that found with 8 satellite DNA. These observations suggest that this satellite DNA clone is derived from a subfamily of satellite DNA and is thus designated X satellite DNA. When genomic DNA from six unrelated males and two unrelated females was cut with SstI or HpaI and separated by pulsed-field gel electrophoresis, no restriction fragment length polymorphisms were observed for either X (2D12/E2) or 8 (50E4) probes. Fluorescence in situ hybridization localized the 2D12/E2 clone to the lateral sides of the primary constriction specifically on the human X chromosome.  相似文献   

11.
We propose a new method, called ‘size leap’ algorithm,of search for motifs of maximum size and common to two fragmentsat least. It allows the creation of a reduced database of motifsfrom a set of sequences whose size obeys the series of Fibonaccinumbers. The convenience lies in the efficiency of the motifextraction. It can be applied in the establishment of overlapregions for DNA sequence reconstruction and multiple alignmentof biological sequences. The method of complete DNA sequencereconstruction by extraction of the longest motifs (‘anchormotifs’) is presented as an application of the size leapalgorithm. The details of a reconstruction from three sequencedfragments are given as an example. Received on February 12, 1991; accepted on February 15, 1991  相似文献   

12.
Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations.  相似文献   

13.
Myers' elegant and powerful bit-parallel dynamic programming algorithm for approximate string matching has a restriction that the query length should be within the word size of the computer, typically 64. We propose a modification of Myers' algorithm, in which the modification has a restriction not on the query length but on the maximum number of mismatches (substitutions, insertions, or deletions), which should be less than half of the word size. The time complexity is O(m log |Σ|), where m is the query length and |Σ| is the size of the alphabet Σ. Thus, it is particularly suited for sequences on a small alphabet such as DNA sequences. In particular, it is useful in quickly extending a large number of seed alignments against a reference genome for high-throughput short-read data produced by next-generation DNA sequencers.  相似文献   

14.
An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand.  相似文献   

15.
The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background.  相似文献   

16.
A strategy for rapid DNA sequence acquisition in an ordered, nonrandom manner, while retaining all of the conveniences of the dideoxy method with M13 transducing phage DNA template, is described. Target DNA 3 to 14 kb in size can be stably carried by our M13 vectors. Suitable targets are stretches of DNA which lack an enzyme recognition site which is unique on our cloning vectors and adjacent to the sequencing primer; current sites that are so useful when lacking are Pst, Xba, HindIII, BglII, EcoRI. By an in vitro procedure, we cut RF DNA once randomly and once specifically, to create thousands of deletions which start at the unique restriction site adjacent to the dideoxy sequencing primer and extend various distances across the target DNA. Phage carrying a desired size of deletions, whose DNA as template will give rise to DNA sequence data in a desired location along the target DNA, may be purified by electrophoresis alive on agarose gels. Phage running in the same location on the agarose gel thus conveniently give rise to nucleotide sequence data from the same kilobase of target DNA.  相似文献   

17.
Naturally-occurring phytases having the required level of thermostability for application in animal feeding have not been found in nature thus far. We decided to de novo construct consensus phytases using primary protein sequence comparisons. A consensus enzyme based on 13 fungal phytase sequences had normal catalytic properties, but showed an unexpected 15-22 degrees C increase in unfolding temperature compared with each of its parents. As a first step towards understanding the molecular basis of increased heat resistance, the crystal structure of consensus phytase was determined and compared with that of Aspergillus niger phytase. Aspergillus niger phytase unfolds at much lower temperatures. In most cases, consensus residues were indeed expected, based on comparisons of both three-dimensional structures, to contribute more to phytase stabilization than non-consensus amino acids. For some consensus amino acids, predicted by structural comparisons to destabilize the protein, mutational analysis was performed. Interestingly, these consensus residues in fact increased the unfolding temperature of the consensus phytase. In summary, for fungal phytases apparently an unexpected direct link between protein sequence conservation and protein stability exists.  相似文献   

18.
A series of hierarchical chemical reactivity calculations have been performed to elucidate the alkylation properties of a methyldiazonium ion toward DNA base sites. Both MINDO/3 and CNDO/2 approximate methods have been employed. For the isolated bases the O6 of guanine is predicted to be the most reactive site. This prediction may also be relevant to single-stranded DNA chains containing guanine. For base-pairs, the N7 and O6 sites on guanine are about equally favored for alkylation. The previous study of aziridinium ion alkylation gave about the same results with N7 guanine modestly favored as the preferred site of alkylation for base-pairs. In composite we conclude that N7 guanine and/or O6 guanine will be the preferred sites for alkylation by a methyldiazonium ion but cannot distinguish between these two in terms of chemical specificity.  相似文献   

19.
Genetic instability of an artificial palindrom DNA sequence   总被引:1,自引:0,他引:1  
A short DNA palindrom, produced by head to head ligation of a 29 bp DNA fragment, was inserted into a 27,000 bp plasmid DNA element composed of two functional replicons (R6K, ColE1). Several plasmid types containing a single copy of this palindrom in different locations of insertion on the R6K sequence were obtained. The palindrom was engineered to possess a unique EcoRI recognition sequence at its axis of symmetry. The presence of this restriction site allowed to monitor the genetic stability of the artificial palindrom at their different insertion loci. Out of 5 different insertion locations, one (in pAS807) was found to lead to a significant destabilization of the palindrom. This insertion site lies within the replication control region of R6K. We have shown that the inserted palindrom in pAS8O7 does not affect the functionality of the R6K replication origins. Excission of the palindrom sequences from pAS8O7 was not accompanied by loss of the adjacent R6K DNA sequences. Different deletion derivatives of pAS807 were generated in-vitro in order to determine the driving unit of DNA sequences around the palindrom that are involved in its excision. The results imply that large DNA structure(s) around the palindrom are involved in its excission. Complete deletion of R6K sequences from either the left or the right side of the palindrom resulted in new configurations which stabilized the palindrom. A configuration of R6K DNA sequences exceeding 270 bp long sequence from both sides of the palindrom are necessary for the transition from a palindrom stable to palindrom unstable state. In addition evidence is presented to show that the excision process of palindrom sequences requires a functional polymerase I but not the gene product of recA.  相似文献   

20.
High-resolution NMR structure of an AT-rich DNA sequence   总被引:2,自引:0,他引:2  
We have determined, by proton NMR and complete relaxation matrix methods, the high-resolution structure of a DNA oligonucleotide in solution with nine contiguous AT base pairs. The stretch of AT pairs, TAATTATAATTATAATTA, is imbedded in a 27-nucleotide stem-and-loop construct, which is stabilized by terminal GC base pairs and an extraordinarily stable DNA loop GAA (Hirao et al., 1994, Nucleic Acids Res. 22, 576–582). The AT-rich sequence has three repeated TAATTA motifs, one in the reverse orientation. Comparison of the local conformations of the three motifs shows that the sequence context has a minor effect here: atomic RMSD between the three TAATTA fragments is 0.4–0.5 Å, while each fragment is defined within the RMSD of 0.3–0.4 Å. The AT-rich stem also contains a consensus sequence for the Pribnow box, TATAAT. The TpA, ApT, and TpTApA steps have characteristic local conformations, a combination of which determines a unique sequence-dependent pattern of minor groove width variation. All three TpA steps are locally bent in the direction compressing the major groove of DNA. These bends, however, compensate each other, because of their relative position in the sequence, so that the overall helical axis is essentially straight.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号