共查询到20条相似文献,搜索用时 15 毫秒
1.
Morgenstern B 《Bioinformatics (Oxford, England)》2000,16(10):948-949
SUMMARY: In the segment-by-segment approach to sequence alignment, pairwise and multiple alignments are generated by comparing gap-free segments of the sequences under study. This method is particularly efficient in detecting local homologies, and it has been used to identify functional regions in large genomic sequences. Herein, an algorithm is outlined that calculates optimal pairwise segment-by-segment alignments in essentially linear space. AVAILABILTIY: The program is available at the Bielefeld Bioinformatics Server (BiBiServ) at http://bibiserv.techfak. uni-bielefeld.de/dialign/ 相似文献
2.
A tool for aligning very similar DNA sequences 总被引:4,自引:0,他引:4
Chao Kun-Mao; Zhang Jinghui; Ostell James; Miller Webb 《Bioinformatics (Oxford, England)》1997,13(1):75-80
Results: We have produced a computer program, named sim3, thatsolves the following computational problem. Two DNA sequencesare given, where the shorter sequence is very similar to somecontiguous region of the longer sequence. Sim3 determines sucha similar region of the longer sequence, and then computes anoptimal set of single-nucleotide changes (i.e. insertions, deletionsor substitutions) that will convert the shorter sequence tothat region. Thus, the alignment scoring scheme is designedto model sequencing errors, rather than evolutionary processes.The program can align a 100 kb sequence to a 1 megabase sequencein a few seconds on a workstation, provided that there are veryfew differences between the shorter sequence and some regionin the longer sequence. The program has been used to assemblesequence data for the Genomes Division at the National Centerfor Biotechnology Information. Availability: A version of sim3 for UNIX machines can be obtainedby anonymous ftp from ncbi. nlm. nih. gov, in the pub/sim3 directory. Contact: For portable versions for Macs and PCs, contact zjing@sunset.nlm. nih. gov. 相似文献
3.
MOTIVATION: Automated annotation of Expressed Sequence Tags (ESTs) is becoming increasingly important as EST databases continue to grow rapidly. A common approach to annotation is to align the gene fragments against well-documented databases of protein sequences. The sensitivity of the alignment algorithm is key to the success of such methods. RESULTS: This paper introduces a new algorithm, FramePlus, for DNA-protein sequence alignment. The SCOP database was used to develop a general framework for testing the sensitivity of such alignment algorithms when searching large databases. Using this framework, the performance of FramePlus was found to be somewhat better than other algorithms in the presence of moderate and high rates of frameshift errors, and comparable to Translated Search in the absence of sequencing errors. AVAILABILITY: The source code for FramePlus and the testing datasets are freely available at ftp.compugen.co.il/pub/research. CONTACT: raveh@compugen.co.il. 相似文献
4.
MOTIVATION: Biologists usually work with textual DNA sequences (succession of A, C, G and T). This representation allows biologists to study the syntax and other linguistic properties of DNA sequences. Nevertheless, such a linear coding offers only a local and a one-dimensional vision of the molecule. The 3D structure of DNA is known to be very important in many essential biological mechanisms. By using 3D conformation models, one is able to construct a 3D trajectory of a naked DNA molecule. From the various studies that we performed, it turned out that two very different textual DNA sequences could have similar 3D structures. RESULTS: In this article, we address a new research work on 3D pattern matching for DNA sequences. The aim of this work is to enhance conventional pattern matching analyses with 3D-augmented criteria. We have developed an algorithm, based on 3D trajectories, which compares angles formed by these trajectories and thus quantifies the difference between two 3D DNA sequences. This analysis performs from a global scale to al local one. AVAILABILITY: Available on request from the authors. 相似文献
5.
This paper describes a generic algorithm for finding restrictionsites within DNA sequences. The genericity ofthe algorithm is made possible through the use of set theory.Basic elements of DNA sequences, i.e. nucleotides (bases), arerepresented in sets, and DNA sequences, whether specific, ambiguousor even protein-coding, are represented as sequences of thosesets. The set intersection operation demonstrates its abilityto perform pattern-matching correctly on various DNA sequences.The performance analysis showed that the degree of complexityof the pattern matching is reduced from exponential to linear.An example is given to show the actual and potential restrictionsites, derived by the generic algorithm, in the DNA sequencetemplate coding for a synthetic calmodulin. Received on October 2, 1990; accepted on December 18, 1990 相似文献
6.
Kamimura R 《International journal of neural systems》2004,14(1):9-26
In this paper, we extend our greedy network-growing algorithm to multi-layered networks. With multi-layered networks, we can solve many complex problems that single-layered networks fail to solve. In addition, the network-growing algorithm is used in conjunction with teacher-directed learning that produces appropriate outputs without computing errors between targets and outputs. Thus, the present algorithm is a very efficient network-growing algorithm. The new algorithm was applied to three problems: the famous vertical-horizontal lines detection problem, a medical data problem and a road classification problem. In all these cases, experimental results confirmed that the method could solve problems that single-layered networks failed to. In addition, information maximization makes it possible to extract salient features in input patterns. 相似文献
7.
8.
A CLIQUE algorithm using DNA computing techniques based on closed-circle DNA sequences 总被引:1,自引:0,他引:1
DNA computing has been applied in broad fields such as graph theory, finite state problems, and combinatorial problem. DNA computing approaches are more suitable used to solve many combinatorial problems because of the vast parallelism and high-density storage. The CLIQUE algorithm is one of the gird-based clustering techniques for spatial data. It is the combinatorial problem of the density cells. Therefore we utilize DNA computing using the closed-circle DNA sequences to execute the CLIQUE algorithm for the two-dimensional data. In our study, the process of clustering becomes a parallel bio-chemical reaction and the DNA sequences representing the marked cells can be combined to form a closed-circle DNA sequences. This strategy is a new application of DNA computing. Although the strategy is only for the two-dimensional data, it provides a new idea to consider the grids to be vertexes in a graph and transform the search problem into a combinatorial problem. 相似文献
9.
There is a pressing need to align the growing set of expressed sequence tags (ESTs) with the newly sequenced human genome. However, the problem is complicated by the exon/intron structure of eukaryotic genes misread nucleotides in ESTs, and the millions of repetitive sequences in genomic sequences. To solve this problem, algorithms that use dynamic programming have been proposed. In reality, however, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we developed software that fully utilizes lookup-tables to detect the start- and endpoints of an EST within a given DNA sequence efficiently, and subsequently promptly identify exons and introns. In addition, the locations of all splice sites must be calculated correctly with high sensitivity and accuracy, while retaining high computational efficiency. This goal is hard to accomplish in practice, due to misread nucleotides in ESTs and repetitive sequences in the genome. Nevertheless, we present two heuristics that effectively settle this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools, such as SIM4 and BLAT, and simultaneously attains high sensitivity and accuracy against a clean dataset of documented genes. 相似文献
10.
K M Chao 《Bioinformatics (Oxford, England)》1999,15(4):298-304
MOTIVATION: Given a genomic DNA sequence, it is still an open problem to determine its coding regions, i.e. the region consisting of exons and introns. The comparison of cDNA and genomic DNA helps the understanding of coding regions. For such an application, it might be adequate to use the restricted affine gap penalties which penalize long gaps with a constant penalty. RESULTS: Several techniques developed for solving the approximate string-matching problem are employed to yield efficient algorithms for computing the optimal alignment with restricted affine gap penalties. In particular, efficient algorithms can be derived based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the cost tables. We have implemented the above methods in C on Sun workstations running SunOS Unix. Preliminary experiments show that these approaches are very promising for aligning a cDNA sequence with a genomic DNA sequence. AVAILABILITY: Calign is available free of charge by anonymous ftp at: iubio.bio. indiana.edu, directory: molbio/align, files: calign.driver.c calign. c. Another URL reference for the files is http://iubio.bio.indiana.edu/soft/molbio/align/+ ++calign.c. 相似文献
11.
12.
Phylogenetic diversity and the greedy algorithm 总被引:1,自引:0,他引:1
Steel M 《Systematic biology》2005,54(4):527-529
Given a phylogenetic tree with leaves labeled by a collection of species, and with weighted edges, the "phylogenetic diversity" of any subset of the species is the sum of the edge weights of the minimal subtree connecting the species. This measure is relevant in biodiversity conservation where one may wish to compare different subsets of species according to how much evolutionary variation they encompass. In this note we show that phylogenetic diversity has an attractive mathematical property that ensures that we can solve the following problem easily by the greedy algorithm: find a subset of the species of any given size k of maximal phylogenetic diversity. We also describe an extension of this result that also allows weights to be assigned to species. 相似文献
13.
G-PRIMER, a web-based primer design program, has been developed to compute a minimal primer set specifically annealed to all the open reading frames in a given microbial genome. This program has been successfully used in the microarray experiment for analyzing the expression of genes in the Xanthomonas campestris genome. AVAILABILITY: It is available at http://mammoth.bii.a-star.edu.sg/gprimer/. Its source code is available upon request. 相似文献
14.
15.
A novel algorithm, GS-Aligner, that uses bit-level operations was developed for aligning genomic sequences. GS-Aligner is efficient in terms of both time and space for aligning two very long genomic sequences and for identifying genomic rearrangements such as translocations and inversions. It is suitable for aligning fairly divergent sequences such as human and mouse genomic sequences. It consists of several efficient components: bit-level coding, search for matching segments between the two sequences as alignment anchors, longest increasing subsequence (LIS), and optimal local alignment. Efforts have been made to reduce the execution time of the program to make it truly practical for aligning very long sequences. Empirical tests suggest that for relatively divergent sequences such as sequences from different mammalian orders or from a mammal and a nonmammalian vertebrate GS-Aligner performs better than existing methods. The program and data can be downloaded from http://pondside.uchicago.edu/~lilab/ and http://webcollab.iis.sinica.edu.tw/~biocom. 相似文献
16.
P Taylor 《Nucleic acids research》1986,14(1):437-441
This paper describes a comprehensive program for translating one or two DNA sequences into amino acid sequences. Written in FORTRAN, it was designed for maximum flexibility of use and easy maintenance, modification and portability. It has full comments throughout. 相似文献
17.
A convenient method of aligning large DNA molecules on bare mica surfaces for atomic force microscopy. 总被引:8,自引:0,他引:8
下载免费PDF全文

Large DNA molecules remain difficult to be imaged by atomic force microscopy (AFM) because of the tendency of aggregation. A method is described to align long DNA fibers in a single direction on unmodified mica to facilitate AFM studies. The clear background, minimal overstretching, high reproducibility and convenience of this aligning procedure make it useful for physical mapping of genome regions and the studies of DNA-protein complexes. 相似文献
18.
O White T Dunning G Sutton M Adams J C Venter C Fields 《Nucleic acids research》1993,21(16):3829-3838
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets. 相似文献
19.
Keith JM Adams P Bryant D Kroese DP Mitchelson KR Cochran DA Lala GH 《Bioinformatics (Oxford, England)》2002,18(11):1494-1499
MOTIVATION: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. RESULTS: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases. 相似文献
20.