首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Protein structure alignment is a fundamental problem in computational and structural biology. While there has been lots of experimental/heuristic methods and empirical results, very few results are known regarding the algorithmic/complexity aspects of the problem, especially on protein local structure alignment. A well-known measure to characterize the similarity of two polygonal chains is the famous Fréchet distance, and with the application of protein-related research, a related discrete Fréchet distance has been used recently. In this paper, following the recent work of Jiang et al. we investigate the protein local structural alignment problem using bounded discrete Fréchet distance. Given m proteins (or protein backbones, which are 3D polygonal chains), each of length O(n), our main results are summarized as follows: * If the number of proteins, m, is not part of the input, then the problem is NP-complete; moreover, under bounded discrete Fréchet distance it is NP-hard to approximate the maximum size common local structure within a factor of n(1-epsilon). These results hold both when all the proteins are static and when translation/rotation are allowed. * If the number of proteins, m, is a constant, then there is a polynomial time solution for the problem.  相似文献   

2.
MOTIVATION: Multiple sequence alignment is an important tool to understand and analyze functions of homologous proteins. However, the logic of residue conservation/variation is usually apparent only in three-dimensional (3D) space, not on a primary sequence level. Thus, in a traditional multiple alignment it is often difficult to directly visualize and analyze key residues because they are masked by other residues along the alignment. Here we present an integrated multiple alignment and 3D structure visualization program that can (1) map and highlight residues from a 1D alignment onto a 3D structure and vice versa and (2) display only the alignment of preselected, key residues. This program, called Visualize Structure Sequence Alignment, also has many other built-in tools that can help analyze multiple sequence alignments. AVAILABILITY: http://bioinformatics.burnham.org/liwz/vissa CONTACT: liwz@burnham.org.  相似文献   

3.
SUMMARY: Contact maps are a valuable visualization tool in structural biology. They are a convenient way to display proteins in two dimensions and to quickly identify structural features such as domain architecture, secondary structure and contact clusters. We developed a tool called CMView which integrates rich contact map analysis with 3D visualization using PyMol. Our tool provides functions for contact map calculation from structure, basic editing, visualization in contact map and 3D space and structural comparison with different built-in alignment methods. A unique feature is the interactive refinement of structural alignments based on user selected substructures. AVAILABILITY: CMView is freely available for Linux, Windows and MacOS. The software and a comprehensive manual can be downloaded from http://www.bioinformatics.org/cmview/. The source code is licensed under the GNU General Public License.  相似文献   

4.
We present computational solutions to two problemsof macromolecular structure interpretation from reconstructedthree-dimensional electron microscopy (3D-EM) maps of largebio-molecular complexes at intermediate resolution (5A-15A). Thetwo problems addressed are: (a) 3D structural alignment (matching)between identified and segmented 3D maps of structure units(e.g. trimeric configuration of proteins), and (b) the secondarystructure identification of a segmented protein 3D map (i.e.locations of a-helices, b -sheets). For problem (a), we presentan efficient algorithm to correlate spatially (and structurally)two 3D maps of structure units. Besides providing a similarityscore between structure units, the algorithm yields an effectivetechnique for resolution refinement of repeated structure units,by 3D alignment and averaging. For problem (b), we present anefficient algorithm to compute eigenvalues and link eigenvectorsof a Gaussian convoluted structure tensor derived from theprotein 3D Map, thereby identifying and locating secondarystructural motifs of proteins. The efficiency and performanceof our approach is demonstrated on several experimentallyreconstructed 3D maps of virus capsid shells from single-particlecryo-EM, as well as computationally simulated protein structuredensity 3D maps generated from protein model entries in theProtein Data Bank.  相似文献   

5.

Backgrounds

Despite continuing progress in X-ray crystallography and high-field NMR spectroscopy for determination of three-dimensional protein structures, the number of unsolved and newly discovered sequences grows much faster than that of determined structures. Protein modeling methods can possibly bridge this huge sequence-structure gap with the development of computational science. A grand challenging problem is to predict three-dimensional protein structure from its primary structure (residues sequence) alone. However, predicting residue contact maps is a crucial and promising intermediate step towards final three-dimensional structure prediction. Better predictions of local and non-local contacts between residues can transform protein sequence alignment to structure alignment, which can finally improve template based three-dimensional protein structure predictors greatly.

Methods

CNNcon, an improved multiple neural networks based contact map predictor using six sub-networks and one final cascade-network, was developed in this paper. Both the sub-networks and the final cascade-network were trained and tested with their corresponding data sets. While for testing, the target protein was first coded and then input to its corresponding sub-networks for prediction. After that, the intermediate results were input to the cascade-network to finish the final prediction.

Results

The CNNcon can accurately predict 58.86% in average of contacts at a distance cutoff of 8 Å for proteins with lengths ranging from 51 to 450. The comparison results show that the present method performs better than the compared state-of-the-art predictors. Particularly, the prediction accuracy keeps steady with the increase of protein sequence length. It indicates that the CNNcon overcomes the thin density problem, with which other current predictors have trouble. This advantage makes the method valuable to the prediction of long length proteins. As a result, the effective prediction of long length proteins could be possible by the CNNcon.  相似文献   

6.
We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almost-Delaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs.  相似文献   

7.
It is known that the backbone conformation of a protein can be reproduced with precision once a correct contact map (two-dimensional representation showing residue pairs in contact) is given as geometrical constraints. There is, however, no way to infer the correct contact map for a protein of unknown structure. We started with one-dimensional constraints using the quantity N14 (the number of neighboring residues within the radius of 14 Å). Since the plot of N14 along a chain shows a good correlation with the corresponding amino acid sequence, the N14 profile obtained from the X-ray structure is predictable from the sequence. Construction of backbone conformations under a given N14 profile was carried out in the following two steps: (1) a contact map from the N14 profile was produced by taking the product of N14 values of every two residues; (2) backbone conformations were generated by applying the distance geometry technique to distance constraints given by the contact map. If present, disulfide bonds in a protein, as well as the secondary structure, were treated as additional constraints, and both cases with or without the additional information were examined. The method was tested for 11 proteins of known structure, and the results indicated that the reproduced conformation was fairly good, using an X-ray structure for comparison, for small proteins of less than 80 residues long. The basic assumption and effectiveness of the present method were compared with those of previous studies employing the geometrical constraint approach. It has become clear that the specific, one-dimensional information (e.g., N14 profile) is more effective than nonspecific, two-dimensional constraints, such as average interresidue distances between particular types of amino acids. © 1993 Wiley-Liss, Inc.  相似文献   

8.
Prediction of the location of structural domains in globular proteins   总被引:7,自引:0,他引:7  
The location of structural domains in proteins is predicted from the amino acid sequence, based on the analysis of a computed contact map for the protein, the average distance map (ADM). Interactions between residues i and j in a protein are subdivided into several ranges, according to the separation |i-j| in the amino acid sequence. Within each range, average spatial distances between every pair of amino acid residues are computed from a data base of known protein structures. Infrequently occurring pairs are omitted as being statistically insignificant. The average distances are used to construct a predicted ADM. The ADM is analyzed for the occurrence of regions with high densities of contacts (compact regions). Locations of rapid changes of density between various parts of the map are determined by the use of scanning plots of contact densities. These locations serve to pinpoint the distribution of compact regions. This distribution, in turn, is used to predict boundaries of domains in the protein. The technique provides an objective method for the location of domains both on a contact map derived from a known three-dimensional protein structure, the real distance map (RDM), and on an ADM. While most other published methods for the identification of domains locate them in the known three-dimensional structure of a protein, the technique presented here also permits the prediction of domains in proteins of unknown spatial structure, as the construction of the ADM for a given protein requires knowledge of only its amino acid sequence.  相似文献   

9.
Protein structure alignment is an important tool in many biological applications, such as protein evolution studies, protein structure modeling, and structure-based, computer-aided drug design. Protein structure alignment is also one of the most challenging problems in computational molecular biology, due to an infinite number of possible spatial orientations of any two protein structures. We study one of the most commonly used measures of pairwise protein structure similarity, defined as the number of pairs of atoms in two proteins that can be superimposed under a predefined distance cutoff. We prove that the expected running time of a recently published algorithm for optimizing this (and some other, derived measures of protein structure similarity) is polynomial.  相似文献   

10.
The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accu- racy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.  相似文献   

11.
One of the goals of molecular bioinformatics is decoding amino acid sequences to extract information on the principles of protein folding. However, this is difficult to perform with standard bioinformatics techniques such as multiple sequence alignment and so on. Thus, we propose a technique based on inter-residue average distance statistics to make predictions regarding the protein folding mechanisms of amino acid sequences. Our method involves constructing a kind of predicted contact map called an Average Distance Map (ADM) based on average distance statistics to pinpoint regions of possible folding nuclei for proteins. Only information on the amino acid sequence of a given protein is required for the present method. In this article, we summarize the results of studies using our method to analyze how specific protein sequences affect folding properties. In particular, we present studies on proteins in the phage lysozyme, such as the globin, fatty acid binding protein-like, and the cupredoxin-like fold families. In the present review, we characterize the 3D architectures of these proteins through the properties of the protein ADMs. Furthermore, we combine the information on the conserved residues within the regions predicted by the ADMs with our results obtained so far. Such information may help identify the folding characteristics of each protein. We discuss this possibility in the present review.  相似文献   

12.
The prediction of the protein tertiary structure from solely its residue sequence (the so called Protein Folding Problem) is one of the most challenging problems in Structural Bioinformatics. We focus on the protein residue contact map. When this map is assigned it is possible to reconstruct the 3D structure of the protein backbone. The general problem of recovering a set of 3D coordinates consistent with some given contact map is known as a unit-disk-graph realization problem and it has been recently proven to be NP-Hard. In this paper we describe a heuristic method (COMAR) that is able to reconstruct with an unprecedented rate (3-15 seconds) a 3D model that exactly matches the target contact map of a protein. Working with a non-redundant set of 1760 proteins, we find that the scoring efficiency of finding a 3D model very close to the protein native structure depends on the threshold value adopted to compute the protein residue contact map. Contact maps whose threshold values range from 10 to 18 Ångstroms allow reconstructing 3D models that are very similar to the proteins native structure.  相似文献   

13.
We report substantial improvements to the previously introduced automated NOE assignment and structure determination protocol known as PASD (Kuszewski et al. (2004) J Am Chem Soc 26:6258-6273). The improved protocol includes extensive analysis of input spectral data to create a low-resolution contact map of residues expected to be close in space. This map is used to obtain reasonable initial guesses of NOE assignment likelihoods which are refined during subsequent structure calculations. Information in the contact map about which residues are predicted to not be close in space is applied via conservative repulsive distance restraints which are used in early phases of the structure calculations. In comparison with the previous protocol, the new protocol requires significantly less computation time. We show results of running the new PASD protocol on six proteins and demonstrate that useful assignment and structural information is extracted on proteins of more than 220 residues. We show that useful assignment information can be obtained even in the case in which a unique structure cannot be determined.  相似文献   

14.
The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed-accuracy tradeoff in a number of popular protein structure alignment methods.  相似文献   

15.
Solis AD  Rackovsky S 《Proteins》2008,71(3):1071-1087
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.  相似文献   

16.
An alpha-helix and a beta-strand are said to be interactively packed if at least one residue in each of the secondary structural elements loses 10% of its solvent accessible contact area on association with the other secondary structural element. An analysis of all such 5,975 nonidentical alpha/beta units in protein structures, defined at < or = 2.5 A resolution, shows that the interaxial distance between the alpha-helix and the beta-strand is linearly correlated with the residue-dependent function, log[(V/nda)/n-int], where V is the volume of amino acid residues in the packing interface, nda is the normalized difference in solvent accessible contact area of the residues in packed and unpacked secondary structural elements, and n-int is the number of residues in the packing interface. The beta-sheet unit (beta u), defined as a pair of adjacent parallel or antiparallel hydrogen-bonded beta-strands, packing with an alpha-helix shows a better correlation between the interaxial distance and log(V/nda) for the residues in the packing interface. This packing relationship is shown to be useful in the prediction of interaxial distances in alpha/beta units using the interacting residue information of equivalent alpha/beta units of homologous proteins. It is, therefore, of value in comparative modeling of protein structures.  相似文献   

17.
Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.  相似文献   

18.
ABSTRACT: BACKGROUND: Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. RESULTS: When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, the classification of proteins into structural families can be viewed as graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may group in the same cluster a subset of 3D structures that do not share a common substructure. To overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and outputs a reduced graph in which no ternary constraints are violated. Our proposition is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. We applied this method to ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. CONCLUSIONS: We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP.  相似文献   

19.
Protein threading using PROSPECT: design and evaluation   总被引:14,自引:0,他引:14  
Xu Y  Xu D 《Proteins》2000,40(3):343-354
The computer system PROSPECT for the protein fold recognition using the threading method is described and evaluated in this article. For a given target protein sequence and a template structure, PROSPECT guarantees to find a globally optimal threading alignment between the two. The scoring function for a threading alignment employed in PROSPECT consists of four additive terms: i) a mutation term, ii) a singleton fitness term, iii) a pairwise-contact potential term, and iv) alignment gap penalties. The current version of PROSPECT considers pair contacts only between core (alpha-helix or beta-strand) residues and alignment gaps only in loop regions. PROSPECT finds a globally optimal threading efficiently when pairwise contacts are considered only between residues that are spatially close (7 A or less between the C(beta) atoms in the current implementation). On a test set consisting of 137 pairs of target-template proteins, each pair being from the same superfamily and having sequence identity 相似文献   

20.
提出了基于图论模型的H系数分类蛋白质结构为H结型和NH结型的方法.论述了蛋白质结构中序列不相邻的C_α原子之间的空间距离与序列相邻的C_α原子之间空间距离的关系.用此方法对PDB的66个单链蛋白质结构进行分类,结果显示H结型占18.2%.H结在全α型中出现比例较高,在全β型中出现比例较小,所以H结倾向出现在含有α螺旋的蛋白质结构中.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号