共查询到20条相似文献,搜索用时 31 毫秒
1.
We report the largest and most comprehensive comparison of protein structural alignment methods. Specifically, we evaluate six publicly available structure alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning all 8,581,970 protein structure pairs in a test set of 2930 protein domains specially selected from CATH v.2.4 to ensure sequence diversity. We consider an alignment good if it matches many residues, and the two substructures are geometrically similar. Even with this definition, evaluating structural alignment methods is not straightforward. At first, we compared the rates of true and false positives using receiver operating characteristic (ROC) curves with the CATH classification taken as a gold standard. This proved unsatisfactory in that the quality of the alignments is not taken into account: sometimes a method that finds less good alignments scores better than a method that finds better alignments. We correct this intrinsic limitation by using four different geometric match measures (SI, MI, SAS, and GSAS) to evaluate the quality of each structural alignment. With this improved analysis we show that there is a wide variation in the performance of different methods; the main reason for this is that it can be difficult to find a good structural alignment between two proteins even when such an alignment exists. We find that STRUCTAL and SSM perform best, followed by LSQMAN and CE. Our focus on the intrinsic quality of each alignment allows us to propose a new method, called "Best-of-All" that combines the best results of all methods. Many commonly used methods miss 10-50% of the good Best-of-All alignments. By putting existing structural alignments into proper perspective, our study allows better comparison of protein structures. By highlighting limitations of existing methods, it will spur the further development of better structural alignment methods. This will have significant biological implications now that structural comparison has come to play a central role in the analysis of experimental work on protein structure, protein function and protein evolution. 相似文献
2.
Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template‐defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile‐based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa . Proteins 2015; 83:411–427. © 2014 Wiley Periodicals, Inc. 相似文献
3.
Background
Proteins are composed of domains, protein segments that fold independently from the rest of the protein and have a specific function. During evolution the arrangement of domains can change: domains are gained, lost or their order is rearranged. To facilitate the analysis of these changes we propose the use of multiple domain alignments.Results
We developed an alignment program, called MDAT, which aligns multiple domain arrangements. MDAT extends earlier programs which perform pairwise alignments of domain arrangements. MDAT uses a domain similarity matrix to score domain pairs and aligns the domain arrangements using a consistency supported progressive alignment method.Conclusion
MDAT will be useful for analysing changes in domain arrangements within and between protein families and will thus provide valuable insights into the evolution of proteins and their domains. MDAT is coded in C++, and the source code is freely available for download at http://www.bornberglab.org/pages/mdat.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0442-7) contains supplementary material, which is available to authorized users. 相似文献4.
A team at the Lawrence Livermore National Laboratory (LLNL) was given the task of using computational tools to speed up the development of DNA diagnostics for pathogen detection. This work will be described in another paper in this issue (see pages 133-149). To achieve this goal it was necessary to understand the merits and limitations of the various available comparative genomics tools. A review of some recent tools for multisequence/genome alignment and substring comparison is presented, within the general framework of applicability to a large-scale application. We note that genome alignments are important for many things, only one of which is pathogen detection. Understanding gene function, gene regulation, gene networks, phylogenetic studies and other aspects of evolution all depend on accurate nucleic acid and protein sequence alignment. Selecting appropriate tools can make a large difference in the quality of results obtained and the effort required. 相似文献
5.
Vector NTI Suite在分子生物学领域的应用 总被引:1,自引:0,他引:1
有许多软件可以帮助分子生物学工作者解决一些他们日常实验室工作中遇到的问题.Vector NTI Suite是一种高度集成、功能齐全的分子生物学应用软件,它可以对核酸、蛋白质等分子进行大量的操作和分析,以及建立和管理生物数据库.重点对Vector NTI Suite软件的概况、应用现状及其在分子生物学领域的应用进行综述与探讨. 相似文献
6.
Zen A Carnevale V Lesk AM Micheletti C 《Protein science : a publication of the Protein Society》2008,17(5):918-929
Proteins that show similarity in their equilibrium dynamics can be aligned by identifying regions that undergo similar concerted movements. These movements are computed from protein native structures using coarse-grained elastic network models. We show the existence of common large-scale movements in enzymes selected from the main functional and structural classes. Alignment via dynamics does not require prior detection of sequence or structural correspondence. Indeed, a third of the statistically significant dynamics-based alignments involve enzymes that lack substantial global or local structural similarities. The analysis of specific residue-residue correspondences of these structurally dissimilar enzymes in some cases suggests a functional relationship of the detected common dynamic features. Including dynamics-based criteria in protein alignment thus provides a promising avenue for relating and grouping enzymes in terms of dynamic aspects that often, though not always, assist or accompany biological function. 相似文献
7.
Comparing two remotely similar structures is a difficult problem: more often than not, resulting structure alignments will show ambiguities and a unique answer usually does not even exist. In addition, alignments in general have a limited information content because every aligned residue is considered equally important. To solve these issues to a certain extent, one can take the perspective of a whole group of similar structures and then evaluate common structural features. Here, we describe a consistency approach that, although not actually performing a multiple structure alignment, does produce the information that one would conceivably want from such an experiment: the key structural features of the group, e.g., a fold, which in this case are projected onto either a pair of proteins or a single protein. Both representations are useful for a number of applications, ranging from the detection of (partially) wrong structure alignments to protein structure classification and fold recognition. To demonstrate some of these applications, the procedure was applied to 195 SCOP folds containing a total of 1802 domains sharing very low sequence similarity. 相似文献
8.
9.
Prediction of protein subcellular localization 总被引:6,自引:0,他引:6
Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization. 相似文献
10.
11.
Zaldivar-Riverón A Belokobylskij SA León-Regagnon V Martínez JJ Briceño R Quicke DL 《Molecular phylogenetics and evolution》2007,44(3):981-992
The braconid wasp subfamily Doryctinae mainly comprises idiobiont ectoparasitoids of other insect larvae. In recent years, however, members of a few genera have been discovered to be associated with galls from various unrelated host plant families, with some of these being gall inducers whereas others are suspected as being predators of gallers. Because of their considerable morphological differences, these gall-associated taxa traditionally have been placed in separate tribes or even in other subfamilies. In this study, we investigate the phylogenetic relationships among representatives of a number of different doryctine genera, including five of its seven gall-associated genera using two genetic markers. Here we analyzed the length-variable 28S sequence data based on secondary structure both excising the unalignable regions and recoding them according to indel length. In addition, multiple alignments were carried out with a range of gap-opening and extension parameters. The combined (28S+CO1) phylogenetic hypotheses obtained, both excluding and recoding the unalignable regions, recover a clade comprising the five gall-associated genera, and most of the analyses using multiple alignments also support this relationship. These results support a scenario in which secondary phytophagy evolves from initially attacking primary gall-forming hosts. The relationships recovered are also more congruent with a model that explains the macroevolution of insect plant association in the Doryctinae as reflecting geographic proximity rather than host plant relationships. Further, our phylogenetic hypotheses consistently show that one of the main morphological features employed in the higher level classification of the Doryctinae is actually highly homoplastic. 相似文献
12.
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm). 相似文献
13.
Computational gene identification by sequence inspection remains a challenging problem. For a typical Arabidopsis thaliana gene with five exons, at least one of the exons is expected to have at least one of its borders predicted incorrectly by ab initio gene finding programs. More detailed analysis for individual genomic loci can often resolve the uncertainty on the basis of EST evidence or similarity to potential protein homologues. Such methods are part of the routine annotation process. However, because the EST and protein databases are constantly growing, in many cases original annotation must be re-evaluated, extended, and corrected on the basis of the latest evidence. The Arabidopsis Genome Initiative is undertaking this task on the whole-genome scale via its participating genome centers. The current Arabidopsis genome annotation provides an excellent starting point for assessing the protein repertoire of a flowering plant. More accurate whole-genome annotation will require the combination of high-throughput and individual gene experimental approaches and computational methods. The purpose of this article is to discuss tools available to an individual researcher to evaluate gene structure prediction for a particular locus. 相似文献
14.
15.
Due to the structural and functional importance of tight turns, some methods have been proposed to predict gamma-turns, beta-turns, and alpha-turns in proteins. In the past, studies of pi-turns were made, but not a single prediction approach has been developed so far. It will be useful to develop a method for identifying pi-turns in a protein sequence. In this paper, the support vector machine (SVM) method has been introduced to predict pi-turns from the amino acid sequence. The training and testing of this approach is performed with a newly collected data set of 640 non-homologous protein chains containing 1931 pi-turns. Different sequence encoding schemes have been explored in order to investigate their effects on the prediction performance. With multiple sequence alignment and predicted secondary structure, the final SVM model yields a Matthews correlation coefficient (MCC) of 0.556 by a 7-fold cross-validation. A web server implementing the prediction method is available at the following URL: http://210.42.106.80/piturn/. 相似文献
16.
Christopher Bork Kenneth Ng Yinhan Liu Alex Yee Michael Pohlscheidt 《Biotechnology progress》2013,29(2):394-402
Chromatogram overlays are frequently used to monitor inter‐batch performance of bioprocess purification steps. However, the objective analysis of chromatograms is difficult due to peak shifts caused by variable phase durations or unexpected process holds. Furthermore, synchronization of batch process data may also be required prior to performing multivariate analysis techniques. Dynamic time warping was originally developed as a method for spoken word recognition, but shows potential in the objective analysis of time variant signals, such as manufacturing data. In this work we will discuss the application of dynamic time warping with a derivative weighting function to align chromatograms to facilitate process monitoring and fault detection. In addition, we will demonstrate the utility of this method as a preprocessing step for multivariate model development. © 2013 American Institute of Chemical Engineers Biotechnol. Prog., 29: 394–402, 2013 相似文献
17.
Friedberg I Kaplan T Margalit H 《Protein science : a publication of the Protein Society》2000,9(11):2278-2284
The PSI-BLAST algorithm has been acknowledged as one of the most powerful tools for detecting remote evolutionary relationships by sequence considerations only. This has been demonstrated by its ability to recognize remote structural homologues and by the greatest coverage it enables in annotation of a complete genome. Although recognizing the correct fold of a sequence is of major importance, the accuracy of the alignment is crucial for the success of modeling one sequence by the structure of its remote homologue. Here we assess the accuracy of PSI-BLAST alignments on a stringent database of 123 structurally similar, sequence-dissimilar pairs of proteins, by comparing them to the alignments defined on a structural basis. Each protein sequence is compared to a nonredundant database of the protein sequences by PSI-BLAST. Whenever a pair member detects its pair-mate, the positions that are aligned both in the sequential and structural alignments are determined, and the alignment sensitivity is expressed as the percentage of these positions out of the structural alignment. Fifty-two sequences detected their pair-mates (for 16 pairs the success was bi-directional when either pair member was used as a query). The average percentage of correctly aligned residues per structural alignment was 43.5+/-2.2%. Other properties of the alignments were also examined, such as the sensitivity vs. specificity and the change in these parameters over consecutive iterations. Notably, there is an improvement in alignment sensitivity over consecutive iterations, reaching an average of 50.9+/-2.5% within the five iterations tested in the current study. 相似文献
18.
Nicolas Thevenin Philippe Bernard Hélène Bourdon Marcel Hibert Camille-Georges Wermuth 《Journal of molecular modeling》2000,6(12):637-647
The binding mode of 3-aminopyridazine analogues to the M1 muscarinic receptor has been studied by two complementary modeling strategies: the active analog approach and direct docking into a 3D model of the receptor. Modeling combined with SAR study: (i) accounts for the contribution to binding of both hydrophilic (Asp311, Asn617) and hydrophobic residues; (ii) illustrates the subtlety of ligand-receptor binding; (iii) highlights a binding site domain that might be responsible to partial or full agonism. 相似文献
19.
Joshua M Miller Stephen S Moore Paul Stothard Xiaoping Liao David W Coltman 《BMC genomics》2015,16(1)