首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
SUMMARY: In the segment-by-segment approach to sequence alignment, pairwise and multiple alignments are generated by comparing gap-free segments of the sequences under study. This method is particularly efficient in detecting local homologies, and it has been used to identify functional regions in large genomic sequences. Herein, an algorithm is outlined that calculates optimal pairwise segment-by-segment alignments in essentially linear space. AVAILABILTIY: The program is available at the Bielefeld Bioinformatics Server (BiBiServ) at http://bibiserv.techfak. uni-bielefeld.de/dialign/  相似文献   

2.
A tool for aligning very similar DNA sequences   总被引:4,自引:0,他引:4  
Results: We have produced a computer program, named sim3, thatsolves the following computational problem. Two DNA sequencesare given, where the shorter sequence is very similar to somecontiguous region of the longer sequence. Sim3 determines sucha similar region of the longer sequence, and then computes anoptimal set of single-nucleotide changes (i.e. insertions, deletionsor substitutions) that will convert the shorter sequence tothat region. Thus, the alignment scoring scheme is designedto model sequencing errors, rather than evolutionary processes.The program can align a 100 kb sequence to a 1 megabase sequencein a few seconds on a workstation, provided that there are veryfew differences between the shorter sequence and some regionin the longer sequence. The program has been used to assemblesequence data for the Genomes Division at the National Centerfor Biotechnology Information. Availability: A version of sim3 for UNIX machines can be obtainedby anonymous ftp from ncbi. nlm. nih. gov, in the pub/sim3 directory. Contact: For portable versions for Macs and PCs, contact zjing@sunset.nlm. nih. gov.  相似文献   

3.
MOTIVATION: Automated annotation of Expressed Sequence Tags (ESTs) is becoming increasingly important as EST databases continue to grow rapidly. A common approach to annotation is to align the gene fragments against well-documented databases of protein sequences. The sensitivity of the alignment algorithm is key to the success of such methods. RESULTS: This paper introduces a new algorithm, FramePlus, for DNA-protein sequence alignment. The SCOP database was used to develop a general framework for testing the sensitivity of such alignment algorithms when searching large databases. Using this framework, the performance of FramePlus was found to be somewhat better than other algorithms in the presence of moderate and high rates of frameshift errors, and comparable to Translated Search in the absence of sequencing errors. AVAILABILITY: The source code for FramePlus and the testing datasets are freely available at ftp.compugen.co.il/pub/research. CONTACT: raveh@compugen.co.il.  相似文献   

4.
MOTIVATION: Biologists usually work with textual DNA sequences (succession of A, C, G and T). This representation allows biologists to study the syntax and other linguistic properties of DNA sequences. Nevertheless, such a linear coding offers only a local and a one-dimensional vision of the molecule. The 3D structure of DNA is known to be very important in many essential biological mechanisms. By using 3D conformation models, one is able to construct a 3D trajectory of a naked DNA molecule. From the various studies that we performed, it turned out that two very different textual DNA sequences could have similar 3D structures. RESULTS: In this article, we address a new research work on 3D pattern matching for DNA sequences. The aim of this work is to enhance conventional pattern matching analyses with 3D-augmented criteria. We have developed an algorithm, based on 3D trajectories, which compares angles formed by these trajectories and thus quantifies the difference between two 3D DNA sequences. This analysis performs from a global scale to al local one. AVAILABILITY: Available on request from the authors.  相似文献   

5.
This paper describes a generic algorithm for finding restrictionsites within DNA sequences. The ‘genericity’ ofthe algorithm is made possible through the use of set theory.Basic elements of DNA sequences, i.e. nucleotides (bases), arerepresented in sets, and DNA sequences, whether specific, ambiguousor even protein-coding, are represented as sequences of thosesets. The set intersection operation demonstrates its abilityto perform pattern-matching correctly on various DNA sequences.The performance analysis showed that the degree of complexityof the pattern matching is reduced from exponential to linear.An example is given to show the actual and potential restrictionsites, derived by the generic algorithm, in the DNA sequencetemplate coding for a synthetic calmodulin. Received on October 2, 1990; accepted on December 18, 1990  相似文献   

6.
在原有的生物大分子序列比对算法的基础上,结合图论中的关健路径法,提出了一种新的计算两寡核苷酸序列间最大配对程度的算法。采用此算法结合生成并测试的方法,能够寻找给定长度的一组适用于DNA计算的寡核苷酸序列。同时采用DNA芯片杂交方法验证了用该算法设计的一组序列的杂交特异性。  相似文献   

7.
MOTIVATION: Given a genomic DNA sequence, it is still an open problem to determine its coding regions, i.e. the region consisting of exons and introns. The comparison of cDNA and genomic DNA helps the understanding of coding regions. For such an application, it might be adequate to use the restricted affine gap penalties which penalize long gaps with a constant penalty. RESULTS: Several techniques developed for solving the approximate string-matching problem are employed to yield efficient algorithms for computing the optimal alignment with restricted affine gap penalties. In particular, efficient algorithms can be derived based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the cost tables. We have implemented the above methods in C on Sun workstations running SunOS Unix. Preliminary experiments show that these approaches are very promising for aligning a cDNA sequence with a genomic DNA sequence. AVAILABILITY: Calign is available free of charge by anonymous ftp at: iubio.bio. indiana.edu, directory: molbio/align, files: calign.driver.c calign. c. Another URL reference for the files is http://iubio.bio.indiana.edu/soft/molbio/align/+ ++calign.c.  相似文献   

8.
In this paper, we extend our greedy network-growing algorithm to multi-layered networks. With multi-layered networks, we can solve many complex problems that single-layered networks fail to solve. In addition, the network-growing algorithm is used in conjunction with teacher-directed learning that produces appropriate outputs without computing errors between targets and outputs. Thus, the present algorithm is a very efficient network-growing algorithm. The new algorithm was applied to three problems: the famous vertical-horizontal lines detection problem, a medical data problem and a road classification problem. In all these cases, experimental results confirmed that the method could solve problems that single-layered networks failed to. In addition, information maximization makes it possible to extract salient features in input patterns.  相似文献   

9.
Zhang H  Liu X 《Bio Systems》2011,105(1):73-82
DNA computing has been applied in broad fields such as graph theory, finite state problems, and combinatorial problem. DNA computing approaches are more suitable used to solve many combinatorial problems because of the vast parallelism and high-density storage. The CLIQUE algorithm is one of the gird-based clustering techniques for spatial data. It is the combinatorial problem of the density cells. Therefore we utilize DNA computing using the closed-circle DNA sequences to execute the CLIQUE algorithm for the two-dimensional data. In our study, the process of clustering becomes a parallel bio-chemical reaction and the DNA sequences representing the marked cells can be combined to form a closed-circle DNA sequences. This strategy is a new application of DNA computing. Although the strategy is only for the two-dimensional data, it provides a new idea to consider the grids to be vertexes in a graph and transform the search problem into a combinatorial problem.  相似文献   

10.
There is a pressing need to align the growing set of expressed sequence tags (ESTs) with the newly sequenced human genome. However, the problem is complicated by the exon/intron structure of eukaryotic genes misread nucleotides in ESTs, and the millions of repetitive sequences in genomic sequences. To solve this problem, algorithms that use dynamic programming have been proposed. In reality, however, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we developed software that fully utilizes lookup-tables to detect the start- and endpoints of an EST within a given DNA sequence efficiently, and subsequently promptly identify exons and introns. In addition, the locations of all splice sites must be calculated correctly with high sensitivity and accuracy, while retaining high computational efficiency. This goal is hard to accomplish in practice, due to misread nucleotides in ESTs and repetitive sequences in the genome. Nevertheless, we present two heuristics that effectively settle this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools, such as SIM4 and BLAT, and simultaneously attains high sensitivity and accuracy against a clean dataset of documented genes.  相似文献   

11.
12.
This paper describes a computer program designed to look for similarities between pairs of nucleic or amino acid sequences. The program looks both for segments of perfect identity or for regions where, using a scoring matrix, a minimum value is exceeded. The results of comparisons are presented as a matrix which is displayed on a simple graphics terminal. Use of a graphics terminal allows the user to display the whole of the two sequences in one screenful or to home-in on regions of interest to examine them in more detail. The program is interactive and so the user can easily see the effect of changes to variables and can use inbuilt editing functions to make insertions to produce alignments of the two sequences. These aligned sequences can then be saved on disk files for further processing.  相似文献   

13.
We present here a fast and sensitive method designed to isolate short nucleotide sequences which have non-random statistical properties and may thus be biologically active. It is based on a first order Markov analysis and allows us to detect statistically significant sequence motifs from six to ten nucleotides long which are significantly shared (or avoided) in the sequences under investigation. This method has been tested on a set of 521 sequences extracted from the Eukaryotic Promoter Database (2). Our results demonstrate the accuracy and the efficiency of the method in that the sequence motifs which are known to act as eukaryotic promoters, such as the TATA-box and the CAAT-box, were clearly identified. In addition we have found other statistically significant motifs, the biological roles of which are yet to be clarified.  相似文献   

14.
Phylogenetic diversity and the greedy algorithm   总被引:1,自引:0,他引:1  
Steel M 《Systematic biology》2005,54(4):527-529
Given a phylogenetic tree with leaves labeled by a collection of species, and with weighted edges, the "phylogenetic diversity" of any subset of the species is the sum of the edge weights of the minimal subtree connecting the species. This measure is relevant in biodiversity conservation where one may wish to compare different subsets of species according to how much evolutionary variation they encompass. In this note we show that phylogenetic diversity has an attractive mathematical property that ensures that we can solve the following problem easily by the greedy algorithm: find a subset of the species of any given size k of maximal phylogenetic diversity. We also describe an extension of this result that also allows weights to be assigned to species.  相似文献   

15.
G-PRIMER, a web-based primer design program, has been developed to compute a minimal primer set specifically annealed to all the open reading frames in a given microbial genome. This program has been successfully used in the microarray experiment for analyzing the expression of genes in the Xanthomonas campestris genome. AVAILABILITY: It is available at http://mammoth.bii.a-star.edu.sg/gprimer/. Its source code is available upon request.  相似文献   

16.
This paper describes a comprehensive program for translating one or two DNA sequences into amino acid sequences. Written in FORTRAN, it was designed for maximum flexibility of use and easy maintenance, modification and portability. It has full comments throughout.  相似文献   

17.
Tan YH  Huang H  Kihara D 《Proteins》2006,64(3):587-600
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.  相似文献   

18.
19.
A novel algorithm, GS-Aligner, that uses bit-level operations was developed for aligning genomic sequences. GS-Aligner is efficient in terms of both time and space for aligning two very long genomic sequences and for identifying genomic rearrangements such as translocations and inversions. It is suitable for aligning fairly divergent sequences such as human and mouse genomic sequences. It consists of several efficient components: bit-level coding, search for matching segments between the two sequences as alignment anchors, longest increasing subsequence (LIS), and optimal local alignment. Efforts have been made to reduce the execution time of the program to make it truly practical for aligning very long sequences. Empirical tests suggest that for relatively divergent sequences such as sequences from different mammalian orders or from a mammal and a nonmammalian vertebrate GS-Aligner performs better than existing methods. The program and data can be downloaded from http://pondside.uchicago.edu/~lilab/ and http://webcollab.iis.sinica.edu.tw/~biocom.  相似文献   

20.
Pattern discovery in unaligned DNA sequences is a challenging problem in both computer science and molecular biology. Several different methods and techniques have been proposed so far, but in most of the cases signals in DNA sequences are very complicated and avoid detection. Exact exhaustive methods can solve the problem only for short signals with a limited number of mutations. In this work, we extend exhaustive enumeration also to longer patterns. More in detail, the basic version of algorithm presented in this paper, given as input a set of sequences and an error ratio epsilon < 1, finds all patterns that occur in at least q sequences of the set with at most epsilonm mutations, where m is the length of the pattern. The only restriction is imposed on the location of mutations along the signal. That is, a valid occurrence of a pattern can present at most [epsiloni] mismatches in the first i nucleotides, and so on. However, we show how the algorithm can be used also when no assumption can be made on the position of mutations. In this case, it is also possible to have an estimate of the probability of finding a signal according to the signal length, the error ratio, and the input parameters. Finally, we discuss some significance measures that can be used to sort the patterns output by the algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号