期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences

David J Russell Samuel F Way Andrew K Benson Khalid Sayood 《BMC bioinformatics》2010,11(1):601

Background

We propose a sequence clustering algorithm and compare the partition quality and execution time of the proposed algorithm with those of a popular existing algorithm. The proposed clustering algorithm uses a grammar-based distance metric to determine partitioning for a set of biological sequences. The algorithm performs clustering in which new sequences are compared with cluster-representative sequences to determine membership. If comparison fails to identify a suitable cluster, a new cluster is created. 相似文献

2.

Grammar-based distance in progressive multiple sequence alignment

David J Russell Hasan H Otu Khalid Sayood 《BMC bioinformatics》2008,9(1):306

Background

We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. 相似文献

3.

NBLAST: a cluster variant of BLAST for NxN comparisons

Michel?Dumontier Christopher?WV?Hogue Email author 《BMC bioinformatics》2002,3(1):13

Background

The BLAST algorithm compares biological sequences to one another in order to determine shared motifs and common ancestry. However, the comparison of all non-redundant (NR) sequences against all other NR sequences is a computationally intensive task. We developed NBLAST as a cluster computer implementation of the BLAST family of sequence comparison programs for the purpose of generating pre-computed BLAST alignments and neighbour lists of NR sequences. 相似文献

4.

A fast algorithm for determining the best combination of local alignments to a query sequence

Gavin?C?Conant Email author Andreas?Wagner 《BMC bioinformatics》2004,5(1):62

Background

Existing sequence alignment algorithms assume that similarities between DNA or amino acid sequences are linearly ordered. That is, stretches of similar nucleotides or amino acids are in the same order in both sequences. Recombination perturbs this order. An algorithm that can reconstruct sequence similarity despite rearrangement would be helpful for reconstructing the evolutionary history of recombined sequences. 相似文献

5.

ICRPfinder: a fast pattern design algorithm for coding sequences and its application in finding potential restriction enzyme recognition sites

Chao Li Yuhua Li Xiangmin Zhang Phillip Stafford Valentin Dinu 《BMC bioinformatics》2009,10(1):286

Background

Restriction enzymes can produce easily definable segments from DNA sequences by using a variety of cut patterns. There are, however, no software tools that can aid in gene building -- that is, modifying wild-type DNA sequences to express the same wild-type amino acid sequences but with enhanced codons, specific cut sites, unique post-translational modifications, and other engineered-in components for recombinant applications. A fast DNA pattern design algorithm, ICRPfinder, is provided in this paper and applied to find or create potential recognition sites in target coding sequences. 相似文献

6.

Combined protein construct and synthetic gene engineering for heterologous protein expression and crystallization using Gene Composer

Amy Raymond Scott Lovell Don Lorimer John Walchli Mark Mixon Ellen Wallace Kaitlin Thompkins Kimberly Archer Alex Burgin Lance Stewart 《BMC biotechnology》2009,9(1):37-15

相似文献

7.

EXMOTIF: efficient structured motif extraction

Yongqiang Zhang Mohammed J Zaki 《Algorithms for molecular biology : AMB》2006,1(1):21-18

Background

Extracting motifs from sequences is a mainstay of bioinformatics. We look at the problem of mining structured motifs, which allow variable length gaps between simple motif components. We propose an efficient algorithm, called EXMOTIF, that given some sequence(s), and a structured motif template, extracts all frequent structured motifs that have quorum q. Potential applications of our method include the extraction of single/composite regulatory binding sites in DNA sequences. 相似文献

8.

An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

Akito Taneda 《BMC bioinformatics》2008,9(1):521

Background

Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery. 相似文献

9.

Comparing sequences without using alignments: application to HIV/SIV subtyping

Gilles Didier Laurent Debomy Maude Pupin Ming Zhang Alexander Grossmann Claudine Devauchelle Ivan Laprevotte 《BMC bioinformatics》2007,8(1):1

Background

In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment. 相似文献

10.

Optimizing amino acid substitution matrices with a local alignment kernel

Hiroto Saigo Jean-Philippe Vert Tatsuya Akutsu 《BMC bioinformatics》2006,7(1):246-12

Background

Detecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. The local alignment kernel depends on an amino acid substitution matrix. Since commonly used BLOSUM or PAM matrices for scoring amino acid matches have been optimized to be used in combination with the Smith-Waterman algorithm, the matrices optimal for the local alignment kernel can be different. 相似文献

11.

MUSCLE: a multiple sequence alignment method with reduced time and space complexity 总被引：4，自引：0，他引：4

Robert?C?Edgar Email author 《BMC bioinformatics》2004,5(1):113

Background

In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. 相似文献

12.

SMOTIF: efficient structured pattern and profile motif search

Yongqiang Zhang Mohammed J Zaki 《Algorithms for molecular biology : AMB》2006,1(1):22-24

Background

A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. 相似文献

13.

MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics

Raphaël Helaers Michel C Milinkovitch 《BMC bioinformatics》2010,11(1):379

Background

The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. 相似文献

14.

Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework

Kazutaka Katoh Hiroyuki Toh 《BMC bioinformatics》2008,9(1):212

Background

Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized. 相似文献

15.

Accelerated probabilistic inference of RNA structure evolution

Ian?Holmes Email author 《BMC bioinformatics》2005,6(1):73

Background

Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. 相似文献

16.

Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences

Shih-Hau Chiu Chien-Chi Chen Gwo-Fang Yuan Thy-Hou Lin 《BMC bioinformatics》2006,7(1):304-8

Background

The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. 相似文献

17.

Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs

Bartek Wilczynski Norbert Dojer Mateusz Patelak Jerzy Tiuryn 《BMC bioinformatics》2009,10(1):82

相似文献

18.

A linear memory algorithm for Baum-Welch training

István?Miklós Irmtraud?M?Meyer Email author 《BMC bioinformatics》2005,6(1):231

Background:

Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. It can be employed as long as a training set of annotated sequences is known, and provides a rigorous way to derive parameter values which are guaranteed to be at least locally optimal. For complex hidden Markov models such as pair hidden Markov models and very long training sequences, even the most efficient algorithms for Baum-Welch training are currently too memory-consuming. This has so far effectively prevented the automatic parameter training of hidden Markov models that are currently used for biological sequence analyses. 相似文献

19.

Subfamily specific conservation profiles for proteins based on n-gram patterns

John K Vries Xiong Liu 《BMC bioinformatics》2008,9(1):72

Background

A new algorithm has been developed for generating conservation profiles that reflect the evolutionary history of the subfamily associated with a query sequence. It is based on n-gram patterns (NP{n,m}) which are sets of n residues and m wildcards in windows of size n+m. The generation of conservation profiles is treated as a signal-to-noise problem where the signal is the count of n-gram patterns in target sequences that are similar to the query sequence and the noise is the count over all target sequences. The signal is differentiated from the noise by applying singular value decomposition to sets of target sequences rank ordered by similarity with respect to the query. 相似文献

20.

RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure 总被引：1，自引：0，他引：1

Qi Liu Yu Yang Chun Chen Jiajun Bu Yin Zhang Xiuzi Ye 《BMC bioinformatics》2008,9(1):176

Background

With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression. 相似文献