期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Generating artificial homologous proteins according to the representative family character in molecular mechanics properties - an attempt in validating an underlying rule of protein evolution

Xin Liu 《FEBS letters》2010,584(5):1059-1065

The molecular mechanics property is the foundation of many characters of proteins. Based on intramolecular hydrophobic force network, the representative family character underlying a protein’s mechanics property is described by a simple two-letter scheme. The tendency of a sequence to become a member of a protein family is scored according to this mathematical representation. Remote homologs of the WW-domain family could be easily designed using such a mechanistic signature of protein homology. Experimental validation showed that nearly all artificial homologs have the representative folding and bioactivity of their assigned family. Since the molecular mechanics property is the only consideration in this study, the results indicate its possible role in the generation of new members of a protein family during evolution. 相似文献

2.

A word-oriented approach to alignment validation

Beiko RG Chan CX Ragan MA 《Bioinformatics (Oxford, England)》2005,21(10):2230-2239

MOTIVATION: Multiple sequence alignment at the level of whole proteomes requires a high degree of automation, precluding the use of traditional validation methods such as manual curation. Since evolutionary models are too general to describe the history of each residue in a protein family, there is no single algorithm/model combination that can yield a biologically or evolutionarily optimal alignment. We propose a 'shotgun' strategy where many different algorithms are used to align the same family, and the best of these alignments is then chosen with a reliable objective function. We present WOOF, a novel 'word-oriented' objective function that relies on the identification and scoring of conserved amino acid patterns (words) between pairs of sequences. RESULTS: Tests on a subset of reference protein alignments from BAliBASE showed that WOOF tended to rank the (manually curated) reference alignment highest among 1060 alternative (automatically generated) alignments for a majority of protein families. Among the automated alignments, there was a strong positive relationship between the WOOF score and similarity to the reference alignment. The speed of WOOF and its independence from explicit considerations of three-dimensional structure make it an excellent tool for analyzing large numbers of protein families. AVAILABILITY: On request from the authors. 相似文献

3.

Multiple sequence alignment using partial order graphs 总被引：14，自引：0，他引：14

Lee C Grasso C Sharlow MF 《Bioinformatics (Oxford, England)》2002,18(3):452-464

MOTIVATION: Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts. RESULTS: We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (Partial Order Alignment (POA)) to guarantee that the optimal alignment of each new sequence versus each sequence in the MSA will be considered. Moreover, this algorithm introduces a new edit operator, homologous recombination, important for multidomain sequences. The algorithm has improved speed (linear time complexity) over existing MSA algorithms, enabling construction of massive and complex alignments (e.g. an alignment of 5000 sequences in 4 h on a Pentium II). We demonstrate the utility of this algorithm on a family of multidomain SH2 proteins, and on EST assemblies containing alternative splicing and polymorphism. AVAILABILITY: The partial order alignment program POA is available at http://www.bioinformatics.ucla.edu/poa. 相似文献

4.

FAST: a novel protein structure alignment algorithm

Zhu J Weng Z 《Proteins》2005,58(3):618-627

We present a novel algorithm named FAST for aligning protein three-dimensional structures. FAST uses a directionality-based scoring scheme to compare the intra-molecular residue-residue relationships in two structures. It employs an elimination heuristic to promote sparseness in the residue-pair graph and facilitate the detection of the global optimum. In order to test the overall accuracy of FAST, we determined its sensitivity and specificity with the SCOP classification (version 1.61) as the gold standard. FAST achieved higher sensitivities than several existing methods (DaliLite, CE, and K2) at all specificity levels. We also tested FAST against 1033 manually curated alignments in the HOMSTRAD database. The overall agreement was 96%. Close inspection of examples from broad structural classes indicated the high quality of FAST alignments. Moreover, FAST is an order of magnitude faster than other algorithms that attempt to establish residue-residue correspondence. Typical pairwise alignments take FAST less than a second with a Pentium III 1.2GHz CPU. FAST software and a web server are available at http://biowulf.bu.edu/FAST/. 相似文献

5.

Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties

Yang Y Tantoso E Li KB 《Journal of theoretical biology》2008,252(1):145-154

Remote homology detection refers to the detection of structure homology in evolutionarily related proteins with low sequence similarity. Supervised learning algorithms such as support vector machine (SVM) are currently the most accurate methods. In most of these SVM-based methods, efforts have been dedicated to developing new kernels to better use the pairwise alignment scores or sequence profiles. Moreover, amino acids’ physicochemical properties are not generally used in the feature representation of protein sequences. In this article, we present a remote homology detection method that incorporates two novel features: (1) a protein's primary sequence is represented using amino acid's physicochemical properties and (2) the similarity between two proteins is measured using recurrence quantification analysis (RQA). An optimization scheme was developed to select different amino acid indices (up to 10 for a protein family) that are best to characterize the given protein family. The selected amino acid indices may enable us to draw better biological explanation of the protein family classification problem than using other alignment-based methods. An SVM-based classifier will then work on the space described by the RQA metrics. The classification scheme is named as SVM-RQA. Experiments at the superfamily level of the SCOP1.53 dataset show that, without using alignment or sequence profile information, the features generated from amino acid indices are able to produce results that are comparable to those obtained by the published state-of-the-art SVM kernels. In the future, better prediction accuracies can be expected by combining the alignment-based features with our amino acids property-based features. Supplementary information including the raw dataset, the best-performing amino acid indices for each protein family and the computed RQA metrics for all protein sequences can be downloaded from http://ym151113.ym.edu.tw/svm-rqa. 相似文献

6.

Incremental window-based protein sequence alignment algorithms

Rangwala H Karypis G 《Bioinformatics (Oxford, England)》2007,23(2):e17-e23

MOTIVATION: Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling. METHODS: We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO. RESULTS: We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable--a critical piece of information for comparative modeling applications. 相似文献

7.

Correspondences between low-energy modes in enzymes: dynamics-based alignment of enzymatic functional families

Zen A Carnevale V Lesk AM Micheletti C 《Protein science : a publication of the Protein Society》2008,17(5):918-929

Proteins that show similarity in their equilibrium dynamics can be aligned by identifying regions that undergo similar concerted movements. These movements are computed from protein native structures using coarse-grained elastic network models. We show the existence of common large-scale movements in enzymes selected from the main functional and structural classes. Alignment via dynamics does not require prior detection of sequence or structural correspondence. Indeed, a third of the statistically significant dynamics-based alignments involve enzymes that lack substantial global or local structural similarities. The analysis of specific residue-residue correspondences of these structurally dissimilar enzymes in some cases suggests a functional relationship of the detected common dynamic features. Including dynamics-based criteria in protein alignment thus provides a promising avenue for relating and grouping enzymes in terms of dynamic aspects that often, though not always, assist or accompany biological function. 相似文献

8.

Pairwise alignment of the DNA sequence using hypercomplex number representation

Shu JJ Ouw LS 《Bulletin of mathematical biology》2004,66(5):1423-1438

A new set of DNA base-nucleic acid codes and their hypercomplex number representation have been introduced for taking the probability of each nucleotide into full account. A new scoring system has been proposed to suit the hypercomplex number representation of the DNA base-nucleic acid codes and incorporated with the method of dot matrix analysis and various algorithms of sequence alignment. The problem of DNA sequence alignment can be processed in a rather similar way to pairwise alignment of the protein sequence. 相似文献

9.

Embedding strategies for effective use of information from multiple sequence alignments. 总被引：4，自引：0，他引：4

下载免费PDF全文

S. Henikoff J. G. Henikoff 《Protein science : a publication of the Protein Society》1997,6(3):698-705

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. 相似文献

10.

The β structure: Inter-strand correlations

Gunnar von Heijne Clas Blomberg 《Journal of molecular biology》1977,117(3):821-824

β structures in 19 proteins have been analysed for pairwise residue-residue correlations. The results show that hydrophobic residues tend to go together mainly as inter-strand nearest-neighbours, a non-local sequence correlation. This suggests that hydrophobic residues in different parts of the protein molecule might “recognize” each other when a β sheet is formed. 相似文献

11.

Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost

Shinsuke Yamada Osamu Gotoh Hayato Yamana 《BMC bioinformatics》2006,7(1):524-17

Background

Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels. 相似文献

12.

Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes 总被引：1，自引：0，他引：1

Hill EE Morea V Chothia C 《Journal of molecular biology》2002,322(1):205-233

Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor. 相似文献

13.

A network synthesis model for generating protein interaction network families

SM Sahraeian BJ Yoon 《PloS one》2012,7(8):e41474

In this work, we introduce a novel network synthesis model that can generate families of evolutionarily related synthetic protein-protein interaction (PPI) networks. Given an ancestral network, the proposed model generates the network family according to a hypothetical phylogenetic tree, where the descendant networks are obtained through duplication and divergence of their ancestors, followed by network growth using network evolution models. We demonstrate that this network synthesis model can effectively create synthetic networks whose internal and cross-network properties closely resemble those of real PPI networks. The proposed model can serve as an effective framework for generating comprehensive benchmark datasets that can be used for reliable performance assessment of comparative network analysis algorithms. Using this model, we constructed a large-scale network alignment benchmark, called NAPAbench, and evaluated the performance of several representative network alignment algorithms. Our analysis clearly shows the relative performance of the leading network algorithms, with their respective advantages and disadvantages. The algorithm and source code of the network synthesis model and the network alignment benchmark NAPAbench are publicly available at http://www.ece.tamu.edu/bjyoon/NAPAbench/. 相似文献

14.

Sigma: multiple alignment of weakly-conserved non-coding DNA sequence

Rahul Siddharthan 《BMC bioinformatics》2006,7(1):143-15

Background

Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. 相似文献

15.

A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction

Eyal E Frenkel-Morgenstern M Sobolev V Pietrokovski S 《Proteins》2007,67(1):142-153

We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complementing traditional scoring functions. 相似文献

16.

Multiple RNA structure alignment

Wang Z Zhang K 《Journal of bioinformatics and computational biology》2005,3(3):609-626

Ribonucleic Acid (RNA) structures can be viewed as a special kind of strings where characters in a string can bond with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in Wang and Zhang,(19) we propose two algorithms to attack the question of aligning multiple RNA structures. Our methods are to reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments. Meanwhile, we will show that the framework of sequence center star alignment algorithm can be applied to the problem of multiple RNA structure alignment, and if the triangle inequality is met in the scoring matrix, the approximation ratio of the algorithm remains to be 2-2(over)n, where n is the total number of structures. 相似文献

17.

Using sequence compression to speedup probabilistic profile matching

Freschi V Bogliolo A 《Bioinformatics (Oxford, England)》2005,21(10):2225-2229

MOTIVATION: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P < N. RESULTS: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding. 相似文献

18.

Learning scoring schemes for sequence alignment from partial examples

Kim E Kececioglu J 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(4):546-556

When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25%. 相似文献

19.

Residue-residue contact substitution probabilities derived from aligned three-dimensional structures and the identification of common folds. 总被引：2，自引：1，他引：1

下载免费PDF全文

M. A. Rodionov M. S. Johnson 《Protein science : a publication of the Protein Society》1994,3(12):2366-2377

We report the derivation of scores that are based on the analysis of residue-residue contact matrices from 443 3-dimensional structures aligned structurally as 96 families, which can be used to evaluate sequence-structure matches. Residue-residue contacts and the more than 3 x 10(6) amino acid substitutions that take place between pairs of these contacts at aligned positions within each family of structures have been tabulated and segregated according to the solvent accessibility of the residues involved. Contact maps within a family of structures are shown to be highly conserved (approximately 75%) even when the sequence identity is approaching 10%. In a comparison involving a globin structure and the search of a sequence databank (> 21,000 sequences), the contact probability scores are shown to provide a very powerful secondary screen for the top scoring sequence-structure matches, where between 69% and 84% of the unrelated matches are eliminated. The search of an aligned set of 2 globins against a sequence databank and the subsequent residue contact-based evaluation of matches locates all 618 globin sequences before the first non-globin match. From a single bacterial serine proteinase structure, the structural template approach coupled with residue-residue contact substitution data lead to the detection of the mammalian serine proteinase family among the top matches in the search of a sequence databank. 相似文献

20.

Improvement of the GenTHREADER method for genomic fold recognition 总被引：10，自引：0，他引：10

McGuffin LJ Jones DT 《Bioinformatics (Oxford, England)》2003,19(7):874-881

MOTIVATION: In order to enhance genome annotation, the fully automatic fold recognition method GenTHREADER has been improved and benchmarked. The previous version of GenTHREADER consisted of a simple neural network which was trained to combine sequence alignment score, length information and energy potentials derived from threading into a single score representing the relationship between two proteins, as designated by CATH. The improved version incorporates PSI-BLAST searches, which have been jumpstarted with structural alignment profiles from FSSP, and now also makes use of PSIPRED predicted secondary structure and bi-directional scoring in order to calculate the final alignment score. Pairwise potentials and solvation potentials are calculated from the given sequence alignment which are then used as inputs to a multi-layer, feed-forward neural network, along with the alignment score, alignment length and sequence length. The neural network has also been expanded to accommodate the secondary structure element alignment (SSEA) score as an extra input and it is now trained to learn the FSSP Z-score as a measurement of similarity between two proteins. RESULTS: The improvements made to GenTHREADER increase the number of remote homologues that can be detected with a low error rate, implying higher reliability of score, whilst also increasing the quality of the models produced. We find that up to five times as many true positives can be detected with low error rate per query. Total MaxSub score is doubled at low false positive rates using the improved method. AVAILABILITY: http://www.psipred.net. 相似文献