首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
C A Orengo  N P Brown  W R Taylor 《Proteins》1992,14(2):139-167
A fast method is described for searching and analyzing the protein structure databank. It uses secondary structure followed by residue matching to compare protein structures and is developed from a previous structural alignment method based on dynamic programming. Linear representations of secondary structures are derived and their features compared to identify equivalent elements in two proteins. The secondary structure alignment then constrains the residue alignment, which compares only residues within aligned secondary structures and with similar buried areas and torsional angles. The initial secondary structure alignment improves accuracy and provides a means of filtering out unrelated proteins before the slower residue alignment stage. It is possible to search or sort the protein structure databank very quickly using just secondary structure comparisons. A search through 720 structures with a probe protein of 10 secondary structures required 1.7 CPU hours on a Sun 4/280. Alternatively, combined secondary structure and residue alignments, with a cutoff on the secondary structure score to remove pairs of unrelated proteins from further analysis, took 10.1 CPU hours. The method was applied in searches on different classes of proteins and to cluster a subset of the databank into structurally related groups. Relationships were consistent with known families of protein structure.  相似文献   

2.
MOTIVATION: Existing algorithms for automated protein structure alignment generate contradictory results and are difficult to interpret. An algorithm which can provide a context for interpreting the alignment and uses a simple method to characterize protein structure similarity is needed. RESULTS: We describe a heuristic for limiting the search space for structure alignment comparisons between two proteins, and an algorithm for finding minimal root-mean-squared-distance (RMSD) alignments as a function of the number of matching residue pairs within this limited search space. Our alignment algorithm uses coordinates of alpha-carbon atoms to represent each amino acid residue and requires a total computation time of O(m(3) n(2)), where m and n denote the lengths of the protein sequences. This makes our method fast enough for comparisons of moderate-size proteins (fewer than approximately 800 residues) on current workstation-class computers and therefore addresses the need for a systematic analysis of multiple plausible shape similarities between two proteins using a widely accepted comparison metric.  相似文献   

3.
Two new methods for the visualization of structural similarity in proteins with known three-dimensional structures are presented. They are based on the degree of equivalence of α-carbon pairs in two proteins. The quantitative measure for residue equivalence is the comparison score generated using the sequence and structure alignment method of Taylor and Orengo, which is based on the comparison of interatomic distances (and other properties that can be defined on a residue basis).The first method uses information on corresponding α-carbon positions to display vectors joining these structurally equivalent residues. These vectors can be defined as target constraints, and their minimization “bends” the two proteins toward a common average structure. In the average structure the corresponding residues virtually superpose, while insertions and deletions become clearly visible.The second method uses the comparison scores to perform a weighted least-squares fit of the two structures. It is further used to color code the two structures according to the score value, i.e., their similarity, on a continuous scale from red to blue. Examples of the methods for the comparison of flavodoxin, chemotaxis Y protein and L-arabinose-binding protein are given.  相似文献   

4.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:6,自引:0,他引:6  
The accuracy of protein sequence alignment obtained by applying a commonly used global sequence comparison algorithm is assessed. Alignments based on the superposition of the three-dimensional structures are used as a standard for testing the automatic, sequence-based methods. Alignments obtained from the global comparison of five pairs of homologous protein sequences studied gave 54% agreement overall for residues in secondary structures. The inclusion of information about the secondary structure of one of the proteins in order to limit the number of gaps inserted in regions of secondary structure, improved this figure to 68%. A similarity score of greater than six standard deviation units suggests that an alignment which is greater than 75% correct within secondary structural regions can be obtained automatically for the pair of sequences.  相似文献   

5.
We have extended the resolution of the crystal structure of human bactericidal/permeability-increasing protein (BPI) to 1.7 A. BPI has two domains with the same fold, but with little sequence similarity. To understand the similarity in structure of the two domains, we compare the corresponding residue positions in the two domains by the method of 3D-1D profiles. A 3D-1D profile is a string formed by assigning each position in the 3D structure to one of 18 environment classes. The environment classes are defined by the local secondary structure, the area of the residue which is buried from solvent, and the fraction of the area buried by polar atoms. A structural alignment between the two BPI domains was used to compare the 3D-1D environments of structurally equivalent positions. Greater than 31% of the aligned positions have conserved 3D-1D environments, but only 13% have conserved residue identities. Analysis of the 3D-1D environmentally conserved positions helps to identify pairs of residues likely to be important in conserving the fold, regardless of the residue similarity. We find examples of 3D-1D environmentally conserved positions with dissimilar residues which nevertheless play similar structural roles. To generalize our findings, we analyzed four other proteins with similar structures yet dissimilar sequences. Together, these examples show that aligned pairs of dissimilar residues often share similar structural roles, stabilizing dissimilar sequences in the same fold.  相似文献   

6.
R B Russell  G J Barton 《Proteins》1992,14(2):309-323
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure-based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root. The algorithm encoded in STAMP (STructural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij' provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij' values and thus provide a reproducible resource for studies of residue conservation within structural motifs.  相似文献   

7.
Standley DM  Toh H  Nakamura H 《Proteins》2004,57(2):381-391
A new algorithm for superimposing protein structures based on maximizing the number of spatially equivalent residues is introduced. The algorithm works in three distinct steps. First, the optimal residue map is calculated by structural alignment. By default, the double dynamic programming algorithm, as implemented in the program ASH, was used for the structure alignment step, but we also present results based on alignments imported from three other programs (Dali, CE, and VAST).Second, the structures are spatially superimposed such that the effective number of equivalent residues (NER)--aligned residue pairs that can be spatially overlapped--is maximized. The NER score is an analytic, differentiable similarity function that rewards spatially equivalent residues but ignores non-equivalent ones. Maximization of the NER score results in accurate superpositions in cases where root mean square deviation (RMSD) minimization fails. Third, the NER function is used in conjunction with traditional dynamic programming to realign the structures based on the proximity of residues in the superposition. Results are presented for a wide range of superposition problems and compared to results from Dali, CE, and VAST. In addition, several structure-structure pairs that show only partial similarity are discussed, and results are compared to those from the LGA, SARF2, and ThreeCa programs.  相似文献   

8.
The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed-accuracy tradeoff in a number of popular protein structure alignment methods.  相似文献   

9.
Protein structural alignments are generally considered as 'golden standard' for the alignment at the level of amino acid residues. In this study we have compared the quality of pairwise and multiple structural alignments of about 5900 homologous proteins from 718 families of known 3-D structures. We observe shifts in the alignment of regular secondary structural elements (helices and strands) between pairwise and multiple structural alignments. The differences between pairwise and multiple structural alignments within helical and beta-strand regions often correspond to 4 and 2 residue positions respectively. Such shifts correspond approximately to "one turn" of these regular secondary structures. We have performed manual analysis explicitly on the family of protein kinases. We note shifts of one or two turns in helix-helix alignments obtained using pairwise and multiple structural alignments. Investigations on the quality of the equivalent helix-helix, strand-strand pairs in terms of their residue side-chain accessibilities have been made. Our results indicate that the quality of the pairwise alignments is comparable to that of the multiple structural alignments and, in fact, is often better. We propose that pairwise alignment of protein structures should also be used in formulation of methods for structure prediction and evolutionary analysis.  相似文献   

10.
We investigated the conservation of sidechain conformation for each residue within a homologous family of proteins in the Protein Data Bank (PDB) and performed sidechain modeling using this information. The information was represented by the probability of conserved sidechain torsional angles obtained from many families of proteins, and these were calculated for a pair of residues at topologically equivalent positions as a result of structural alignment. Probabilities were obtained for a pair of same amino acids and for a pair of different amino acids. The correlation between environmental residues and the fluctuation of probability was examined for the pair of same amino acid residues, and the simple probability was calculated for the pair of different amino acids. From the results on the same amino acid pairs, 17 amino acids, except for Ala, Gly, and Pro, were divided into two types: those that were influenced and those that were not influenced by the environmental residues. From results on different amino acid pairs, a replacement between large residues, such as Trp, Phe, and Tyr, was performed assuming conservation of their torsional angles within a homologous family of proteins. We performed sidechain modeling for 11 known proteins from their native and modeled backbones, respectively. With the native backbones, the percentage of the χ1 angle correct within 30° was found to be 67% and 80% for all and core residues, respectively. With the modeled backbones, the percentage of the correct χ1 angle was found to be 60% and 72% for all and core residues, respectively. To estimate an upper limit on the accuracy for predicting sidechain conformations, we investigated the probability of conserved sidechain torsional angles for highly similar proteins having > 90% sequence identity and <2.5-Å X-ray resolution. In those proteins, 83% of the sidechain conformations were conserved for the χ1 angle. Proteins 31:355–369, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

11.
Discovery of local packing motifs in protein structures   总被引:1,自引:0,他引:1  
We present a language for describing structural patterns of residues in protein structures and a method for the discovery of such patterns that recur in a set of protein structures. The patterns impose restrictions on the spatial position of each residue, their order along the amino acid chain, and which amino acids are allowed in each position. Unlike other methods for comparing sets of protein structures, our method is not based on the use of pairwise structure comparisons which is often time consuming and can produce inconsistent results. Instead, the method simultaneously takes into account information from all structures in the search for conserved structure patterns which are potential structure motifs. The method is based on describing the spatial neighborhoods of each residue in each structure as a string and applying a sequence pattern discovery method to find patterns common to subsets of these strings. Finally it is checked whether the similarities between the neighborhood strings correspond to spatially similar substructures. We apply the method to analyze sets of very disparate proteins from the four different protein families: serine proteases, cuprodoxins, cysteine proteinases, and ferredoxins. The motifs found by the method correspond well to the site and motif information given in the annotation of these proteins in PDB, Swiss-Prot, and PROSITE. Furthermore, the motifs are confirmed by using the motif data to constrain the structural alignment of the proteins obtained with the program SAP. This gave the best superposition/alignment of the proteins given the motif assignment.  相似文献   

12.
Protein structural alignments are generally considered as ‘golden standard’ for the alignment at the level of amino acid residues. In this study we have compared the quality of pairwise and multiple structural alignments of about 5900 homologous proteins from 718 families of known 3-D structures. We observe shifts in the alignment of regular secondary structural elements (helices and strands) between pairwise and multiple structural alignments. The differences between pairwise and multiple structural alignments within helical and β-strand regions often correspond to 4 and 2 residue positions respectively. Such shifts correspond approximately to “one turn” of these regular secondary structures. We have performed manual analysis explicitly on the family of protein kinases. We note shifts of one or two turns in helix-helix alignments obtained using pairwise and multiple structural alignments. Investigations on the quality of the equivalent helix-helix, strand-strand pairs in terms of their residue side-chain accessibilities have been made. Our results indicate that the quality of the pairwise alignments is comparable to that of the multiple structural alignments and, in fact, is often better. We propose that pairwise alignment of protein structures should also be used in formulation of methods for structure prediction and evolutionary analysis.  相似文献   

13.
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.  相似文献   

14.
MOTIVATION: The quality of a model structure derived from a comparative modeling procedure is dictated by the accuracy of the predicted sequence-template alignment. As the sequence-template pairs are increasingly remote in sequence relationship, the prediction of the sequence-template alignments becomes increasingly problematic with sequence alignment methods. Structural information of the template, used in connection with the sequence relationship of the sequence-template pair, could significantly improve the accuracy of the sequence-template alignment. In this paper, we describe a sequence-template alignment method that integrates sequence and structural information to enhance the accuracy of sequence-template alignments for distantly related protein pairs. RESULTS: The structure-dependent sequence alignment (SDSA) procedure was optimized for coverage and accuracy on a training set of 412 protein pairs; the structures for each of the training pairs are similar (RMSD< approximately 4A) but the sequence relationship is undetectable (average pair-wise sequence identity = 8%). The optimized SDSA procedure was then applied to extend PSI-BLAST local alignments by calculating the global alignments under the constraint of the residue pairs in the local alignments. This composite alignment procedure was assessed with a testing set of 1421 protein pairs, of which the pair-wise structures are similar (RMSD< approximately 4A) but the sequences are marginally related at best in each pair (average pair-wise sequence identity = 13%). The assessment showed that the composite alignment procedure predicted more aligned residues pairs with an average of 27% increase in correctly aligned residues over the standard PSI-BLAST alignments for the protein pairs in the testing set.  相似文献   

15.
We examine how effectively simple potential functions previously developed can identify compatibilities between sequences and structures of proteins for database searches. The potential function consists of pairwise contact energies, repulsive packing potentials of residues for overly dense arrangement and short-range potentials for secondary structures, all of which were estimated from statistical preferences observed in known protein structures. Each potential energy term was modified to represent compatibilities between sequences and structures for globular proteins. Pairwise contact interactions in a sequence-structure alignment are evaluated in a mean field approximation on the basis of probabilities of site pairs to be aligned. Gap penalties are assumed to be proportional to the number of contacts at each residue position, and as a result gaps will be more frequently placed on protein surfaces than in cores. In addition to minimum energy alignments, we use probability alignments made by successively aligning site pairs in order by pairwise alignment probabilities. The results show that the present energy function and alignment method can detect well both folds compatible with a given sequence and, inversely, sequences compatible with a given fold, and yield mostly similar alignments for these two types of sequence and structure pairs. Probability alignments consisting of most reliable site pairs only can yield extremely small root mean square deviations, and including less reliable pairs increases the deviations. Also, it is observed that secondary structure potentials are usefully complementary to yield improved alignments with this method. Remarkably, by this method some individual sequence-structure pairs are detected having only 5-20% sequence identity.  相似文献   

16.
Black DJ  Tikunova SB  Johnson JD  Davis JP 《Biochemistry》2000,39(45):13831-13837
A series of N-terminal calmodulin (CaM) mutants was generated to probe the relationship between the N-terminal Ca(2+) affinity and the number of paired, negatively charged Ca(2+) chelating residues in the N-terminal Ca(2+)-binding sites of CaM. When the number of acid pairs [negatively charged residues at positions +x and -x (X-axis), +y and -y (Y-axis), and +z and -z (Z-axis)] was increased from zero to one and then to two, a progressive increase was seen in the N-terminal Ca(2+) affinities. The maximal ranges of the increases observed in the N-terminal Ca(2+) affinity were approximately 8-8.5-fold for site I, approximately 4.5-5-fold for site II, and approximately 11-fold for both sites, in comparison to the mutants containing no acid pairs. The maximal values of N-terminal Ca(2+) affinity were bestowed by the presence of five acidic chelating residues in site I or II, individually. Addition of the sixth acidic chelating residue (third acid pair) to both N-terminal Ca(2+)-binding sites reduced the N-terminal Ca(2+) affinity. The increases in Ca(2+) affinity observed were caused by an increase in the Ca(2+) association rates for the Y- and Z-axis acid pairs, while the X-axis acid pair caused a reduction in the Ca(2+) dissociation rates.  相似文献   

17.
Balaji S  Aruna S  Srinivasan N 《Proteins》2003,53(4):783-791
Occurrence and accommodation of charged amino acid residues in proteins that are structurally equivalent to buried non-polar residues in homologues have been investigated. Using a dataset of 1,852 homologous pairs of crystal structures of proteins available at 2A or better resolution, 14,024 examples of apolar residues in the structurally conserved regions replaced by charged residues in homologues have been identified. Out of 2,530 cases of buried apolar residues, 1,677 of the equivalent charged residues in homologues are exposed and the rest of the charged residues are buried. These drastic substitutions are most often observed in homologous protein pairs with low sequence identity (<30%) and in large protein domains (>300 residues). Such buried charged residues in the large proteins are often located in the interface of sub-domains or in the interface of structural repeats, Beyond 7A of residue depth of buried apolar residues, or less than 4% of solvent accessibility, almost all the substituting charged residues are buried. It is also observed that acidic sidechains have higher preference to get buried than the positively charged residues. There is a preference for buried charged residues to get accommodated in the interior by forming hydrogen bonds with another sidechain than the main chain. The sidechains interacting with a buried charged residue are most often located in the structurally conserved regions of the alignment. About 50% of the observations involving hydrogen bond between buried charged sidechain and another sidechain correspond to salt bridges. Among the buried charged residues interacting with the main chain, positively charged sidechains form hydrogen bonds commonly with main chain carbonyls while the negatively charged residues are accommodated by hydrogen bonding with the main chain amides. These carbonyls and amides are usually located in the loops that are structurally variable among homologous proteins.  相似文献   

18.
19.
A protein structure comparison method is described that allows the generation of large populations of high-scoring alternate alignments. This was achieved by incorporating a random element into an iterative double dynamic programming algorithm. The maximum scores from repeated comparisons of a pair of structures converged on a value that was taken as the global maximum. This lay 15% over the score obtained from the single fixed (unrandomized) calculation. The effect of the gap penalty was observed through the shift of the alignment populations, characterized by their alignment length and root-mean-square deviation (RMSD). The best (lowest RMSD) values found in these populations provided a base-line against which other methods were compared.  相似文献   

20.
Fast, efficient, and reliable algorithms for pairwise alignment of protein structures are in ever-increasing demand for analyzing the rapidly growing data on protein structures. CLePAPS is a tool developed for this purpose. It distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of the three angles formed by Calpha pseudobonds of four contiguous residues. A substitution matrix called CLESUM is available to measure the similarity between any two such letters. CLePAPS regards an aligned fragment pair (AFP) as an ungapped string pair with a high sum of pairwise CLESUM scores. Using CLESUM scores as the similarity measure, CLePAPS searches for AFPs by simple string comparison. The transformation which best superimposes a highly similar AFP can be used to superimpose the structure pairs under comparison. A highly scored AFP which is consistent with several other AFPs determines an initial alignment. CLePAPS then joins consistent AFPs guided by their similarity scores to extend the alignment by several "zoom-in" iteration steps. A follow-up refinement produces the final alignment. CLePAPS does not implement dynamic programming. The utility of CLePAPS is tested on various protein structure pairs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号