期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

PSSM-based prediction of DNA binding sites in proteins

Shandar?Ahmad Email author Akinori?Sarai 《BMC bioinformatics》2005,6(1):33

Background

Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites. 相似文献

2.

Novel and unexpected bacterial diversity in an arsenic-rich ecosystem revealed by culture-dependent approaches

Fran?ois Delavat Marie-Claire Lett Didier Lièvremont 《Biology direct》2012,7(1):1-14

Background

BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i?+?1. Biegert and S?ding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch.

Results

We describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI??s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC₅₀₀₀ of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST.

Conclusions

DELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the ??Protein BLAST?? link at http://blast.ncbi.nlm.nih.gov.

Reviewers

This article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber. 相似文献

3.

Protein disorder prediction by condensed PSSM considering propensity for order or disorder

Chung-Tsai Su Chien-Yu Chen Yu-Yen Ou 《BMC bioinformatics》2006,7(1):319-16

Background

More and more disordered regions have been discovered in protein sequences, and many of them are found to be functionally significant. Previous studies reveal that disordered regions of a protein can be predicted by its primary structure, the amino acid sequence. One observation that has been widely accepted is that ordered regions usually have compositional bias toward hydrophobic amino acids, and disordered regions are toward charged amino acids. Recent studies further show that employing evolutionary information such as position specific scoring matrices (PSSMs) improves the prediction accuracy of protein disorder. As more and more machine learning techniques have been introduced to protein disorder detection, extracting more useful features with biological insights attracts more attention. 相似文献

4.

Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues

Anand B Gowri VS Srinivasan N 《Bioinformatics (Oxford, England)》2005,21(12):2821-2826

MOTIVATION: Position specific scoring matrices (PSSMs) corresponding to aligned sequences of homologous proteins are commonly used in homology detection. A PSSM is generated on the basis of one of the homologues as a reference sequence, which is the query in the case of PSI-BLAST searches. The reference sequence is chosen arbitrarily while generating PSSMs for reverse BLAST searches. In this work we demonstrate that the use of multiple PSSMs corresponding to a given alignment and variable reference sequences is more effective than using traditional single PSSMs and hidden Markov models. RESULTS: Searches for proteins with known 3-D structures have been made against three databases of protein family profiles corresponding to known structures: (1) One PSSM per family; (2) multiple PSSMs corresponding to an alignment and variable reference sequences for every family; and (3) hidden Markov models. A comparison of the performances of these three approaches suggests that the use of multiple PSSMs is most effective. CONTACT: ns@mbu.iisc.ernet.in. 相似文献

5.

The p53HMM algorithm: using profile hidden markov models to detect p53-responsive genes

Todd Riley Xin Yu Eduardo Sontag Arnold Levine 《BMC bioinformatics》2009,10(1):111-13

相似文献

6.

Predicting beta-turns in proteins using support vector machines with fractional polynomials

Murtada Khalafallah Elbashir Jianxin Wang Fang-Xiang Wu Lusheng Wang 《Proteome science》2013,11(Z1):S5

Background

β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design.

Results

We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features.

Conclusions

In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.

相似文献

7.

Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment

Kazunori D. Yamada 《Algorithms for molecular biology : AMB》2018,13(1):5

Background

A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. By definition, these functions cannot capture nonlinear relationships between profiles. Therefore, we attempted to discover a novel scoring function, which was more suitable for the profile-comparison method than existing functions, using neural networks.

Results

Although neural networks required derivative-of-cost functions, the problem being addressed in this study lacked them. Therefore, we implemented a novel derivative-free neural network by combining a conventional neural network with an evolutionary strategy optimization method used as a solver. Using this novel neural network system, we optimized the scoring function to align remote sequence pairs. Our results showed that the pairwise-profile aligner using the novel scoring function significantly improved both alignment sensitivity and precision relative to aligners using existing functions.

Conclusions

We developed and implemented a novel derivative-free neural network and aligner (Nepal) for optimizing sequence alignments. Nepal improved alignment quality by adapting to remote sequence alignments and increasing the expressiveness of similarity scores. Additionally, this novel scoring function can be realized using a simple matrix operation and easily incorporated into other aligners. Moreover our scoring function could potentially improve the performance of homology detection and/or multiple-sequence alignment of remote homologous sequences. The goal of the study was to provide a novel scoring function for profile alignment method and develop a novel learning system capable of addressing derivative-free problems. Our system is capable of optimizing the performance of other sophisticated methods and solving problems without derivative-of-cost functions, which do not always exist in practical problems. Our results demonstrated the usefulness of this optimization method for derivative-free problems.

相似文献

8.

Accuracy of structure-based sequence alignment of automatic methods

Changhoon Kim Byungkook Lee 《BMC bioinformatics》2007,8(1):355

Background

Accurate sequence alignments are essential for homology searches and for building three-dimensional structural models of proteins. Since structure is better conserved than sequence, structure alignments have been used to guide sequence alignments and are commonly used as the gold standard for sequence alignment evaluation. Nonetheless, as far as we know, there is no report of a systematic evaluation of pairwise structure alignment programs in terms of the sequence alignment accuracy. 相似文献

9.

MAGIC-SPP: a database-driven DNA sequence processing package with associated management tools

Chun Liang Feng Sun Haiming Wang Junfeng Qu Robert M Freeman Jr Lee H Pratt Marie-Michèle Cordonnier-Pratt 《BMC bioinformatics》2006,7(1):115-15

Background

Processing raw DNA sequence data is an especially challenging task for relatively small laboratories and core facilities that produce as many as 5000 or more DNA sequences per week from multiple projects in widely differing species. To meet this challenge, we have developed the flexible, scalable, and automated sequence processing package described here. 相似文献

10.

Multiple non-collinear TF-map alignments of promoter regions

Enrique Blanco Roderic Guigó Xavier Messeguer 《BMC bioinformatics》2007,8(1):138

Background

The analysis of the promoter sequence of genes with similar expression patterns is a basic tool to annotate common regulatory elements. Multiple sequence alignments are on the basis of most comparative approaches. The characterization of regulatory regions from co-expressed genes at the sequence level, however, does not yield satisfactory results in many occasions as promoter regions of genes sharing similar expression programs often do not show nucleotide sequence conservation. 相似文献

11.

SeqHound: biological sequence and structure database as a platform for bioinformatics research

Katerina Michalickova Gary D Bader Michel Dumontier Hao Lieu Doron Betel Ruth Isserlin Christopher WV Hogue 《BMC bioinformatics》2002,3(1):32-13

Background

SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. 相似文献

12.

PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences

Saurabh?Sinha Email author Mathieu?Blanchette Martin?Tompa 《BMC bioinformatics》2004,5(1):170

相似文献

13.

Embedding strategies for effective use of information from multiple sequence alignments. 总被引：4，自引：0，他引：4

下载免费PDF全文

S. Henikoff J. G. Henikoff 《Protein science : a publication of the Protein Society》1997,6(3):698-705

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. 相似文献

14.

Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures

Vadim?Alexandrov Email author Mark?Gerstein 《BMC bioinformatics》2004,5(1):2

Background

Hidden Markov Models (HMMs) have proven very useful in computational biology for such applications as sequence pattern matching, gene-finding, and structure prediction. Thus far, however, they have been confined to representing 1D sequence (or the aspects of structure that could be represented by character strings). 相似文献

15.

Automated functional classification of experimental and predicted protein structures

Kai Wang Ram Samudrala 《BMC bioinformatics》2006,7(1):278

Background

Proteins that are similar in sequence or structure may perform different functions in nature. In such cases, function cannot be inferred from sequence or structural similarity. 相似文献

16.

uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

Minghui Jiang James Anderson Joel Gillespie Martin Mayne 《BMC bioinformatics》2008,9(1):192

Background

Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such as doublet counts, triplet counts, and, in general, k-let counts. 相似文献

17.

Species-specific analysis of protein sequence motifs using mutual information

Jan?Hummel Nima?Keshvari Wolfram?Weckwerth Joachim?Selbig Email author 《BMC bioinformatics》2005,6(1):164

相似文献

18.

LipocalinPred: a SVM-based method for prediction of lipocalins

Jayashree Ramana Dinesh Gupta 《BMC bioinformatics》2009,10(1):445

Background

Functional annotation of rapidly amassing nucleotide and protein sequences presents a challenging task for modern bioinformatics. This is particularly true for protein families sharing extremely low sequence identity, as for lipocalins, a family of proteins with varied functions and great diversity at the sequence level, yet conserved structures. 相似文献

19.

SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution

Andy?Pang Andrew?D?Smith Paulo?AS?Nuin Elisabeth?RM?Tillier Email author 《BMC bioinformatics》2005,6(1):236

Background

General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets. 相似文献

20.

Statistical distributions of optimal global alignment scores of random protein sequences

Hongxia?Pang Jiaowei?Tang Su-Shing?Chen Shiheng?Tao Email author 《BMC bioinformatics》2005,6(1):257

Background

The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. 相似文献