期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast index based algorithms and software for matching position specific scoring matrices

Michael Beckstette Robert Homann Robert Giegerich Stefan Kurtz 《BMC bioinformatics》2006,7(1):389-25

Background

In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. 相似文献

2.

ORENZA: a web resource for studying ORphan ENZyme activities

Olivier Lespinet Bernard Labedan 《BMC bioinformatics》2006,7(1):436

Background

Despite the current availability of several hundreds of thousands of amino acid sequences, more than 36% of the enzyme activities (EC numbers) defined by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) are not associated with any amino acid sequence in major public databases. This wide gap separating knowledge of biochemical function and sequence information is found for nearly all classes of enzymes. Thus, there is an urgent need to explore these sequence-less EC numbers, in order to progressively close this gap. 相似文献

3.

rMotifGen: random motif generator for DNA and protein sequences

Eric C Rouchka C Timothy Hardin 《BMC bioinformatics》2007,8(1):292

相似文献

4.

MicroSyn: A user friendly tool for detection of microsynteny in a gene family

Bin Cai Xiaohan Yang Gerald A Tuskan Zong-Ming Cheng 《BMC bioinformatics》2011,12(1):79

Background

The traditional phylogeny analysis within gene family is mainly based on DNA or amino acid sequence homologies. However, these phylogenetic tree analyses are not suitable for those "non-traditional" gene families like microRNA with very short sequences. For the normal protein-coding gene families, low bootstrap values are frequently encountered in some nodes, suggesting low confidence or likely inappropriateness of placement of those members in those nodes. 相似文献

5.

Species-specific analysis of protein sequence motifs using mutual information

Jan?Hummel Nima?Keshvari Wolfram?Weckwerth Joachim?Selbig Email author 《BMC bioinformatics》2005,6(1):164

相似文献

6.

Automatic annotation of protein motif function with Gene Ontology terms

Xinghua?Lu Email author Chengxiang?Zhai Vanathi?Gopalakrishnan Bruce?G?Buchanan 《BMC bioinformatics》2004,5(1):122

Background

Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. 相似文献

7.

Multiple sequence alignments of partially coding nucleic acid sequences

Roman R Stocsits Ivo L Hofacker Claudia Fried Peter F Stadler 《BMC bioinformatics》2005,6(1):160

Background

High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes. 相似文献

8.

STAR: predicting recombination sites from amino acid sequence

Denis C Bauer Mikael Bodén Ricarda Thier Elizabeth M Gillam 《BMC bioinformatics》2006,7(1):437

Background

Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone. 相似文献

9.

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Bobbie-Jo M Webb-Robertson Kyle G Ratuiste Christopher S Oehmen 《BMC bioinformatics》2010,11(1):145

Background

The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. 相似文献

10.

Grammar-based distance in progressive multiple sequence alignment

David J Russell Hasan H Otu Khalid Sayood 《BMC bioinformatics》2008,9(1):306

Background

We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. 相似文献

11.

A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models

Juliana S Bernardes Alessandra Carbone Gerson Zaverucha 《BMC bioinformatics》2011,12(1):83

Background

Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). 相似文献

12.

G-InforBIO: integrated system for microbial genomics

Naoto Tanaka Takashi Abe Satoru Miyazaki Hideaki Sugawara 《BMC bioinformatics》2006,7(1):368-7

Background

Genome databases contain diverse kinds of information, including gene annotations and nucleotide and amino acid sequences. It is not easy to integrate such information for genomic study. There are few tools for integrated analyses of genomic data, therefore, we developed software that enables users to handle, manipulate, and analyze genome data with a variety of sequence analysis programs. 相似文献

13.

Screening of transgenic proteins expressed in transgenic food crops for the presence of short amino acid sequences identical to potential,IgE – binding linear epitopes of allergens

Gijs?A?Kleter Email author Ad?ACM?Peijnenburg 《BMC structural biology》2002,2(1):8

Background

Transgenic proteins expressed by genetically modified food crops are evaluated for their potential allergenic properties prior to marketing, among others by identification of short identical amino acid sequences that occur both in the transgenic protein and allergenic proteins. A strategy is proposed, in which the positive outcomes of the sequence comparison with a minimal length of six amino acids are further screened for the presence of potential linear IgE-epitopes. This double track approach involves the use of literature data on IgE-epitopes and an antigenicity prediction algorithm. 相似文献

14.

Predictors of natively unfolded proteins: unanimous consensus score to detect a twilight zone between order and disorder in generic datasets

Antonio Deiana Andrea Giansanti 《BMC bioinformatics》2010,11(1):198

Background

Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. 相似文献

15.

NestedMICA as an ab initio protein motif discovery tool

Mutlu Doğruel Thomas A Down Tim JP Hubbard 《BMC bioinformatics》2008,9(1):19

相似文献

16.

Length-dependent prediction of protein intrinsic disorder 总被引：2，自引：0，他引：2

Kang Peng Predrag Radivojac Slobodan Vucetic A Keith Dunker Zoran Obradovic 《BMC bioinformatics》2006,7(1):208-17

Background

Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (≤30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions. 相似文献

17.

Mining prokaryotic genomes for unknown amino acids: a stop-codon-based approach

Masashi Fujita Hisaaki Mihara Susumu Goto Nobuyoshi Esaki Minoru Kanehisa 《BMC bioinformatics》2007,8(1):225

Background

Selenocysteine and pyrrolysine are the 21st and 22nd amino acids, which are genetically encoded by stop codons. Since a number of microbial genomes have been completely sequenced to date, it is tempting to ask whether the 23rd amino acid is left undiscovered in these genomes. Recently, a computational study addressed this question and reported that no tRNA gene for unknown amino acid was found in genome sequences available. However, performance of the tRNA prediction program on an unknown tRNA family, which may have atypical sequence and structure, is unclear, thereby rendering their result inconclusive. A protein-level study will provide independent insight into the novel amino acid. 相似文献

18.

Low-complexity regions within protein sequences have position-dependent roles 总被引：1，自引：0，他引：1

Alain Coletta John W Pinney David Y Weiss Solís James Marsh Steve R Pettifer Teresa K Attwood 《BMC systems biology》2010,4(1):43

Background

Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis. 相似文献

19.

Incorporating background frequency improves entropy-based residue conservation measures

Kai Wang Ram Samudrala 《BMC bioinformatics》2006,7(1):385

Background

Several entropy-based methods have been developed for scoring sequence conservation in protein multiple sequence alignments. High scoring amino acid positions may correlate with structurally or functionally important residues. However, amino acid background frequencies are usually not taken into account in these entropy-based scoring schemes. 相似文献

20.

Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases

Jan Charles Biro 《Theoretical biology & medical modelling》2006,3(1):28-11

Background

All the information necessary for protein folding is supposed to be present in the amino acid sequence. It is still not possible to provide specific ab initio structure predictions by bioinformatical methods. It is suspected that additional folding information is present in protein coding nucleic acid sequences, but this is not represented by the known genetic code. 相似文献