首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Existing algorithms for automated protein structure alignment generate contradictory results and are difficult to interpret. An algorithm which can provide a context for interpreting the alignment and uses a simple method to characterize protein structure similarity is needed. RESULTS: We describe a heuristic for limiting the search space for structure alignment comparisons between two proteins, and an algorithm for finding minimal root-mean-squared-distance (RMSD) alignments as a function of the number of matching residue pairs within this limited search space. Our alignment algorithm uses coordinates of alpha-carbon atoms to represent each amino acid residue and requires a total computation time of O(m(3) n(2)), where m and n denote the lengths of the protein sequences. This makes our method fast enough for comparisons of moderate-size proteins (fewer than approximately 800 residues) on current workstation-class computers and therefore addresses the need for a systematic analysis of multiple plausible shape similarities between two proteins using a widely accepted comparison metric.  相似文献   

2.

Background

Protein structure comparison play important role in in silico functional prediction of a new protein. It is also used for understanding the evolutionary relationships among proteins. A variety of methods have been proposed in literature for comparing protein structures but they have their own limitations in terms of accuracy and complexity with respect to computational time and space. There is a need to improve the computational complexity in comparison/alignment of proteins through incorporation of important biological and structural properties in the existing techniques.

Results

An efficient algorithm has been developed for comparing protein structures using elastic shape analysis in which the sequence of 3D coordinates atoms of protein structures supplemented by additional auxiliary information from side-chain properties are incorporated. The protein structure is represented by a special function called square-root velocity function. Furthermore, singular value decomposition and dynamic programming have been employed for optimal rotation and optimal matching of the proteins, respectively. Also, geodesic distance has been calculated and used as the dissimilarity score between two protein structures. The performance of the developed algorithm is tested and found to be more efficient, i.e., running time reduced by 80–90 % without compromising accuracy of comparison when compared with the existing methods. Source codes for different functions have been developed in R. Also, user friendly web-based application called ProtSComp has been developed using above algorithm for comparing protein 3D structures and is accessible free.

Conclusions

The methodology and algorithm developed in this study is taking considerably less computational time without loss of accuracy (Table 2). The proposed algorithm is considering different criteria of representing protein structures using 3D coordinates of atoms and inclusion of residue wise molecular properties as auxiliary information.
  相似文献   

3.

Background

A protein structure can be determined by solving a so-called distance geometry problem whenever a set of inter-atomic distances is available and sufficient. However, the problem is intractable in general and has proved to be a NP hard problem. An updated geometric build-up algorithm (UGB) has been developed recently that controls numerical errors and is efficient in protein structure determination for cases where only sparse exact distance data is available. In this paper, the UGB method has been improved and revised with aims at solving distance geometry problems more efficiently and effectively.

Methods

An efficient algorithm (called the revised updated geometric build-up algorithm (RUGB)) to build up a protein structure from atomic distance data is presented and provides an effective way of determining a protein structure with sparse exact distance data. In the algorithm, the condition to determine an unpositioned atom iteratively is relaxed (when compared with the UGB algorithm) and data structure techniques are used to make the algorithm more efficient and effective. The algorithm is tested on a set of proteins selected randomly from the Protein Structure Database-PDB.

Results

We test a set of proteins selected randomly from the Protein Structure Database-PDB. We show that the numerical errors produced by the new RUGB algorithm are smaller when compared with the errors of the UGB algorithm and that the novel RUGB algorithm has a significantly smaller runtime than the UGB algorithm.

Conclusions

The RUGB algorithm relaxes the condition for updating and incorporates the data structure for accessing neighbours of an atom. The revisions result in an improvement over the UGB algorithm in two important areas: a reduction on the overall runtime and decrease of the numeric error.
  相似文献   

4.
Multiple protein structure alignment.   总被引:5,自引:2,他引:3       下载免费PDF全文
A method was developed to compare protein structures and to combine them into a multiple structure consensus. Previous methods of multiple structure comparison have only concatenated pairwise alignments or produced a consensus structure by averaging coordinate sets. The current method is a fusion of the fast structure comparison program SSAP and the multiple sequence alignment program MULTAL. As in MULTAL, structures are progressively combined, producing intermediate consensus structures that are compared directly to each other and all remaining single structures. This leads to a hierarchic "condensation," continually evaluated in the light of the emerging conserved core regions. Following the SSAP approach, all interatomic vectors were retained with well-conserved regions distinguished by coherent vector bundles (the structural equivalent of a conserved sequence position). Each bundle of vectors is summarized by a resultant, whereas vector coherence is captured in an error term, which is the only distinction between conserved and variable positions. Resultant vectors are used directly in the comparison, which is weighted by their error values, giving greater importance to the matching of conserved positions. The resultant vectors and their errors can also be used directly in molecular modeling. Applications of the method were assessed by the quality of the resulting sequence alignments, phylogenetic tree construction, and databank scanning with the consensus. Visual assessment of the structural superpositions and consensus structure for various well-characterized families confirmed that the consensus had identified a reasonable core.  相似文献   

5.
C A Orengo  N P Brown  W R Taylor 《Proteins》1992,14(2):139-167
A fast method is described for searching and analyzing the protein structure databank. It uses secondary structure followed by residue matching to compare protein structures and is developed from a previous structural alignment method based on dynamic programming. Linear representations of secondary structures are derived and their features compared to identify equivalent elements in two proteins. The secondary structure alignment then constrains the residue alignment, which compares only residues within aligned secondary structures and with similar buried areas and torsional angles. The initial secondary structure alignment improves accuracy and provides a means of filtering out unrelated proteins before the slower residue alignment stage. It is possible to search or sort the protein structure databank very quickly using just secondary structure comparisons. A search through 720 structures with a probe protein of 10 secondary structures required 1.7 CPU hours on a Sun 4/280. Alternatively, combined secondary structure and residue alignments, with a cutoff on the secondary structure score to remove pairs of unrelated proteins from further analysis, took 10.1 CPU hours. The method was applied in searches on different classes of proteins and to cluster a subset of the databank into structurally related groups. Relationships were consistent with known families of protein structure.  相似文献   

6.
Patterns of receptor-ligand interaction can be conserved in functionally equivalent proteins even in the absence of sequence homology. Therefore, structural comparison of ligand-binding pockets and their pharmacophoric features allow for the characterization of so-called "orphan" proteins with known three-dimensional structure but unknown function, and predict ligand promiscuity of binding pockets. We present an algorithm for rapid pocket comparison (PoLiMorph), in which protein pockets are represented by self-organizing graphs that fill the volume of the cavity. Vertices in these three-dimensional frameworks contain information about the local ligand-receptor interaction potential coded by fuzzy property labels. For framework matching, we developed a fast heuristic based on the maximum dispersion problem, as an alternative to techniques utilizing clique detection or geometric hashing algorithms. A sophisticated scoring function was applied that incorporates knowledge about property distributions and ligand-receptor interaction patterns. In an all-against-all virtual screening experiment with 207 pocket frameworks extracted from a subset of PDBbind, PoLiMorph correctly assigned 81% of 69 distinct structural classes and demonstrated sustained ability to group pockets accommodating the same ligand chemotype. We determined a score threshold that indicates "true" pocket similarity with high reliability, which not only supports structure-based drug design but also allows for sequence-independent studies of the proteome.  相似文献   

7.
Jia M  Luo L  Liu C 《Biopolymers》2004,73(1):16-26
A new integrated sequence-structure database, called IADE (Integrated ASTRAL-DSSP-EMBL), incorporating matching mRNA sequence, amino acid sequence, and protein secondary structural data, is constructed. It includes 648 protein domains. Based on the IADE database, we studied the relation between RNA stem-loop frequencies and protein secondary structure. It was found that the alpha-helices and beta-strands on proteins tend to be preferably "coded" by mRNA stem region, while the coils on proteins tend to be preferably "coded" by mRNA loop region. These tendencies are more obvious if we observe the structural words (SWs). An SW is defined by a four-amino-acid-fragment that shows the pronounced secondary structural (alpha-helix or beta-strand) propensity. It is demonstrated that the deduced correlation between protein and mRNA structure can hardly be explained as the stochastic fluctuation effect.  相似文献   

8.

Background

In the backdrop of challenge to obtain a protein structure under the known limitations of both experimental and theoretical techniques, the need of a fast as well as accurate protein structure evaluation method still exists to substantially reduce a huge gap between number of known sequences and structures. Among currently practiced theoretical techniques, homology modelling backed by molecular dynamics based optimization appears to be the most popular one. However it suffers from contradictory indications of different validation parameters generated from a set of protein models which are predicted against a particular target protein. For example, in one model Ramachandran Score may be quite high making it acceptable, whereas, its potential energy may not be very low making it unacceptable and vice versa. Towards resolving this problem, the main objective of this study was fixed as to utilize a simple experimentally derived output, Surface Roughness Index of concerned protein of unknown structure as an intervening agent that could be obtained using ordinary microscopic images of heat denatured aggregates of the same protein.

Result

It was intriguing to observe that direct experimental knowledge of the concerned protein, however simple it may be, might give insight on acceptability of its particular structural model out of a confusion set of models generated from database driven comparative technique for structure prediction. The result obtained from a widely varying structural class of proteins indicated that speed of protein structure evaluation can be further enhanced without compromising with accuracy by recruiting simple experimental output.

Conclusion

In this work, a semi-empirical methodological approach was provided for improving protein structure evaluation. It showed that, once structure models of a protein were obtained through homology technique, the problem of selection of a best model out of a confusion set of Pareto-optimal structures could be resolved by employing a structure agent directly obtainable through experiment with the same protein as experimental ingredient. Overall, in the backdrop of getting a reasonably accurate protein structure of pathogens causing epidemics or biological warfare, such approach could be of use as a plausible solution for fast drug design.
  相似文献   

9.
The human cytomegalovirus DNA polymerase is composed of a catalytic subunit, UL54, and an accessory protein, UL44, which has a structural fold similar to that of other processivity factors, including herpes simplex virus UL42 and homotrimeric sliding clamps such as proliferating cell nuclear antigen. Several specific residues in the C-terminal region of UL54 and in the "connector loop" of UL44 are required for the association of these proteins. Here, we describe the crystal structure of residues 1-290 of UL44 in complex with a peptide from the extreme C terminus of UL54, which explains this interaction at a molecular level. The UL54 peptide binds to structural elements similar to those used by UL42 and the sliding clamps to associate with their respective binding partners. However, the details of the interaction differ from those of other processivity factor-peptide complexes. Crucial residues include a three-residue hydrophobic "plug" from the UL54 peptide and Ile(135) of UL44, which forms a critical intramolecular hydrophobic anchor for interactions between the connector loop and the peptide. As was the case for the unliganded UL44 structure, the UL44-peptide complex forms a head-to-head dimer that could potentially form a C-shaped clamp on DNA. However, the peptide-bound structure displays subtle differences in the relative orientation of the two subdomains of the protein, resulting in a more open clamp, which we predicted would affect its association with DNA. Indeed, filter binding assays revealed that peptide-bound UL44 binds DNA with higher affinity. Thus, interaction with the catalytic subunit appears to affect both the structure and function of UL44.  相似文献   

10.
A substructure matching algorithm is described that can be used for the automatic identification of secondary structural motifs in three-dimensional protein structures from the Protein Data Bank. The proteins and motifs are stored for searching as labelled graphs, with the nodes of a graph corresponding to linear representations of helices and strands and the edges to the inter-line angles and distances. A modification of Ullman's subgraph isomorphism algorithm is described that can be used to search these graph representations. Tests with patterns from the protein structure literature demonstrate both the efficiency and the effectiveness of the search procedure, which has been implemented in FORTRAN 77 on a MicroVAX-II system, coupled to the molecular fitting program FRODO on an Evans and Sutherland PS300 graphics system.  相似文献   

11.
Protein topology can be described at different levels. At the most fundamental level, it is a sequence of secondary structure elements (a "primary topology string"). Searching predicted primary topology strings against a library of strings from known protein structures is the basis of some protein fold recognition methods. Here a method known as TOPSCAN is presented for rapid comparison of protein structures. Rather than a simple two-letter alphabet (encoding strand and helix), more complex alphabets are used encoding direction, proximity, accessibility and length of secondary elements and loops in addition to secondary structure. Comparisons are made between the structural information content of primary topology strings and encodings which contain additional information ("secondary topology strings"). The algorithm is extremely fast, with a scan of a large domain against a library of more than 2000 secondary structure strings completing in approximately 30 s. Analysis of protein fold similarity using TOPSCAN at primary and secondary topology levels is presented.  相似文献   

12.
MOTIVATION: Secondary-Structure Guided Superposition tool (SSGS) is a permissive secondary structure-based algorithm for matching of protein structures and in particular their fragments. The algorithm was developed towards protein structure prediction via fragment assembly. RESULTS: In a fragment-based structural prediction scheme, a protein sequence is cut into building blocks (BBs). The BBs are assembled to predict their relative 3D arrangement. Finally, the assemblies are refined. To implement this prediction scheme, a clustered structural library representing sequence patterns for protein fragments is essential. To create a library, BBs generated by cutting proteins from the PDB are compared and structurally similar BBs are clustered. To allow structural comparison and clustering of the BBs, which are often relatively short with flexible loops, we have devised SSGS. SSGS maintains high similarity between cluster members and is highly efficient. When it comes to comparing BBs for clustering purposes, the algorithm obtains better results than other, non-secondary structure guided protein superimposition algorithms.  相似文献   

13.
A program for template matching of protein sequences   总被引:1,自引:0,他引:1  
The matching of a template to a protein sequence is simplifiedby treating it as a special case of sequence alignment. Restrictionof the distances between motifs in the template controls againstspurious matches within very long sequences. The program usingthis algorithm is fast enough to be used in scanning large databasesfor sequences matching a complex template. Received on August 17, 1987; accepted on January 11, 1988  相似文献   

14.
15.
We combine a new, extremely fast technique to generate a library of low energy structures of an oligopeptide (by using mutually orthogonal Latin squares to sample its conformational space) with a genetic algorithm to predict protein structures. The protein sequence is divided into oligopeptides, and a structure library is generated for each. These libraries are used in a newly defined mutation operator that, together with variation, crossover, and diversity operators, is used in a modified genetic algorithm to make the prediction. Application to five small proteins has yielded near native structures.  相似文献   

16.
MOTIVATION: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method. RESULTS: Analysis of CASP5 models suggested several possible avenues for introduction of knots into these models, and these insights were applied to structure prediction in CASP 6, resulting in a significant decrease in the proportion of knotted models generated. Additionally, using the knot detection algorithm on structures in the Protein Data Bank, a previously unreported deep trefoil knot was found in acetylornithine transcarbamylase. AVAILABILITY: The Knotfind algorithm is available in the Rosetta structure prediction program at http://www.rosettacommons.org.  相似文献   

17.
With the rapid development of structural determination of target proteins for human diseases, high throughout virtual screening based drug discovery is gaining popularity gradually. In this paper, a fast docking algorithm (H-DOCK) based on hydrogen bond matching and surface shape complementarity was developed. In H-DOCK, firstly a divide-and-conquer strategy based enumeration approach is applied to rank the intermolecular modes between protein and ligand by maximizing their hydrogen bonds matching, then each docked conformation of the ligand is calculated according to the matched hydrogen bonding geometry, finally a simple but effective scoring function reflecting mainly the van der Waals interaction is used to evaluate the docked conformations of the ligand. H-DOCK is tested for rigid ligand docking and flexible one, the latter is implemented by repeating rigid docking for multiple conformations of a small molecule and ranking all together. For rigid ligands, H-DOCK was tested on a set of 271 complexes where there is at least one intermolecular hydrogen bond, and H-DOCK achieved success rate (RMSD<2.0?Å) of 91.1%. For flexible ligands, H-DOCK was tested on another set of 93 complexes, where each case was a conformation ensemble containing native ligand conformation as well as 100 decoy ones generated by AutoDock [1], and the success rate reached 81.7%. The high success rate of H-DOCK indicates that the hydrogen bonding and steric hindrance can grasp the key interaction between protein and ligand. H-DOCK is quite efficient compared with the conventional docking algorithms, and it takes only about 0.14 seconds for a rigid ligand docking and about 8.25 seconds for a flexible one on average. According to the preliminary docking results, it implies that H-DOCK can be potentially used for large scale virtual screening as a pre-filter for a more accurate but less efficient docking algorithm.  相似文献   

18.
Li CH  Ma XH  Chen WZ  Wang CX 《Proteins》2003,52(1):47-50
An efficient soft docking algorithm is described for predicting the mode of binding between an antibody and its antigen based on the three-dimensional structures of the molecules. The basic tools are the "simplified protein" model and the docking algorithm of Wodak and Janin. The side-chain flexibility of Arg, Lys, Asp, Glu, and Met residues on the protein surface is taken into account. A combined filtering technique is used to select candidate binding modes. After energy minimization, we calculate a scoring function, which includes electrostatic and desolvation energy terms. This procedure was applied to targets 04, 05, and 06 of CAPRI, which are complexes of three different camelid antibody VHH variable domains with pig alpha-amylase. For target 06, two native-like structures with a root-mean-square deviation < 4.0 A relative to the X-ray structure were found within the five top ranking structures. For targets 04 and 05, our procedure produced models where more than half of the antigen residues forming the epitope were correctly predicted, albeit with a wrong VHH domain orientation. Thus, our soft docking algorithm is a promising tool for predicting antibody-antigen recognition.  相似文献   

19.
Using indirect protein-protein interactions for protein complex prediction   总被引:1,自引:0,他引:1  
Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein-protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.  相似文献   

20.
The crystal structure of the scaffolding protein CheW from Thermoanaerobacter tengcongensis (TtCheW) is reported with a resolution at 2.2A using molecular replacement. Based on the crystal structure TmCheA P4-P5-TmCheW from Thermotoga maritime reported by others, we modeled the TmCheA P4-P5-TtCheW complex and predicted that TtCheW is involved in a hydrophobic interaction with CheA, similar to that for TmCheW. We also found that the conserved motif "NxxGxIxP" from CheW plays an important role in CheA binding. The coincidence of the reported mutation sites related to CheW-MCP binding, and the predicted protein interaction region within the TtCheW molecule, suggest that CheW-MCP binding sites lie in the groove-shaped area between TtCheW and the CheA P4 domain within the assembled model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号