首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. The covariance of a population of protein structures can reveal the population-specific variations and be helpful in improving structure classification. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions. We show that our method performs comparably with commonly used methods in protein structure classification on a large manually annotated data set.  相似文献   

2.
3.
Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ~1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/.  相似文献   

4.
5.
Structural biology and structural genomics are expected to produce many three-dimensional protein structures in the near future. Each new structure raises questions about its function and evolution. Correct functional and evolutionary classification of a new structure is difficult for distantly related proteins and error-prone using simple statistical scores based on sequence or structure similarity. Here we present an accurate numerical method for the identification of evolutionary relationships (homology). The method is based on the principle that natural selection maintains structural and functional continuity within a diverging protein family. The problem of different rates of structural divergence between different families is solved by first using structural similarities to produce a global map of folds in protein space and then further subdividing fold neighborhoods into superfamilies based on functional similarities. In a validation test against a classification by human experts (SCOP), 77% of homologous pairs were identified with 92% reliability. The method is fully automated, allowing fast, self-consistent and complete classification of large numbers of protein structures. In particular, the discrimination between analogy and homology of close structural neighbors will lead to functional predictions while avoiding overprediction.  相似文献   

6.
In this paper, we present a new scheme named ProtClass for automatic classification of three-dimensional (3D) protein structures. It is a dedicated and unified multiclass classification scheme. Neither detailed structural alignment nor multiple binary classifications are required in this scheme. We adopt a nearest neighbor-based classification strategy. We use a filter-and-refine scheme. In the first step, we filter out the improbable answers using the precalculated parameters from the training data. In the second, we perform a relatively more detailed nearest neighbor search on the remaining answers. We use very concise and effective encoding schemes of the 3D protein structures in both steps. We compare our proposed method against two other dedicated protein structure classification schemes, namely SGM and CPMine. The experimental results show that ProtClass is slightly better in accuracy than SGM and much faster. In comparison with CPMine, ProtClass is much more accurate, while their running times are about the same. We also compare ProtClass against a structural alignment-based classification scheme named DALI, which is found to be more accurate, but extremely slow. The software is available upon request from the authors. The supplementary information on ProtClass method can be found at: http://xena1.ddns.comp.nus.edu.sg/ approximately genesis/PClass.htm.  相似文献   

7.
The intron/exon organization of the human gene for glycogen phosphorylase has been determined. The segments of the polypeptide chain that corresponds to the 19 exons of the gene are examined for relationships between the three-dimensional structure to the protein and gene structure. Only weak correlations are observed between domains of phosphorylase and exons. The nucleotide binding domains that are found in phosphorylase and other glycolytic enzymes are examined for relationships between exons of the genes and structures of the domains. When mapped to the three-dimensional structures, the intron/exon boundaries are shown to be widely distributed in this family of protein domains.  相似文献   

8.
The classification of protein structures is an important and still outstanding problem. The purpose of this paper is threefold. First, we utilize a relation between the Tutte and homfly polynomial to show that the Alexander-Conway polynomial can be algorithmically computed for a given planar graph. Second, as special cases of planar graphs, we use polymer graphs of protein structures. More precisely, we use three building blocks of the three-dimensional protein structure--alpha-helix, antiparallel beta-sheet, and parallel beta-sheet--and calculate, for their corresponding polymer graphs, the Tutte polynomials analytically by providing recurrence equations for all three secondary structure elements. Third, we present numerical results comparing the results from our analytical calculations with the numerical results of our algorithm-not only to test consistency, but also to demonstrate that all assigned polynomials are unique labels of the secondary structure elements. This paves the way for an automatic classification of protein structures.  相似文献   

9.
In the postgenomic era, bioinformatic analysis of sequence similarity is an immensely powerful tool to gain insight into evolution and protein function. Over long evolutionary distances, however, sequence-based methods fail as the similarities become too low for phylogenetic analysis. Macromolecular structure generally appears better conserved than sequence, but clear models for how structure evolves over time are lacking. The exponential growth of three-dimensional structural information may allow novel structure-based methods to drastically extend the evolutionary time scales amenable to phylogenetics and functional classification of proteins. To this end, we analyzed 80 structures from the functionally diverse ferritin-like superfamily. Using evolutionary networks, we demonstrate that structural comparisons can delineate and discover groups of proteins beyond the "twilight zone" where sequence similarity does not allow evolutionary analysis, suggesting that considerable and useful evolutionary signal is preserved in three-dimensional structures.  相似文献   

10.
Shestopalov BV 《Biofizika》2007,52(5):804-811
Using the data of X-ray diffraction analysis for 100 three-dimensional structures of 26 proteins uniformly distributed among three main classes of the alpha-helix-beta-structure classification and without potentially polyanion regions, 154 comparisons of the content of alpha-helix and beta-structure content were made for structures obtained at different pH values of the medium, being distributed in the whole among all these proteins in the range from 1.5 to 12.0. No significant influence of pH of the medium on the size and localization of alpha-helices and beta-strands was found. As a consequence, it is suggested for the protein structure in a crystal that the alpha-helical-beta-structural backbone of protein structures does not depend on pH of the medium, except when the whole protein or its part can become a polyion so that the electrostatic interactions would either hinder or favour the formation of regular structures, and the conformational properties of ionizable amino acids are independent of pH of the medium. It is unclear, whether these assumptions can be extended to the case of solution, because the data for the structures in solution have been obtained for one protein only. These results can be used in investigations of protein structure, in protein engineering, and in the creation of specialized data banks of protein structures.  相似文献   

11.
Protein structural annotation and classification is an important and challenging problem in bioinformatics. Research towards analysis of sequence-structure correspondences is critical for better understanding of a protein's structure, function, and its interaction with other molecules. Clustering of protein domains based on their structural similarities provides valuable information for protein classification schemes. In this article, we attempt to determine whether structure information alone is sufficient to adequately classify protein structures. We present an algorithm that identifies regions of structural similarity within a given set of protein structures, and uses those regions for clustering. In our approach, called STRALCP (STRucture ALignment-based Clustering of Proteins), we generate detailed information about global and local similarities between pairs of protein structures, identify fragments (spans) that are structurally conserved among proteins, and use these spans to group the structures accordingly. We also provide a web server at http://as2ts.llnl.gov/AS2TS/STRALCP/ for selecting protein structures, calculating structurally conserved regions and performing automated clustering.  相似文献   

12.
Summary A distance measure that reflects the dissimilarity among structures has been developed on the basis of the three-dimensional structures of similar proteins, this being totally independent of sequence in the sense that only the relative spatial positions of mainchain alpha-carbon atoms need be known. This procedure leads to phyletic relationships that are in general correlated with the sequence phylogenies based on residue type. Such relationships among known protein three-dimensional structures are also a useful aid to their classification and selection in knowledge-based modeling using homologous structures. We have applied this approach to six homologous sets of proteins: immunoglobulin fragments, globins, cytochromesc, serine proteinases, eye-lens gamma crystallins, and dinucleotide-binding domains.  相似文献   

13.
Material remains of ancestor nucleotides and proteins are largely unavailable, thus sequence comparison among homologous genes in present-day organisms forms the core of current knowledge of molecular evolution. Variation in protein three-dimensional structure is a basis for functional diversity. To study the evolution of three-dimensional structures in related proteins would significantly improve our understanding of protein evolution and function. A protein may contain ancestor conformations that have been allosterically suppressed by evolutionarily additive structures. Using monoclonal antibody probes to detect such conformation in proteins after removing the suppressor structure, our study demonstrates three-dimensional structure evidence for the evolutionary relationship between troponin I and troponin T, two subunits of the troponin complex in the Ca2+-regulatory system of striated muscle, and among their muscle type-specific isoforms. The experimental data show the feasibility of detecting evolutionarily suppressed history-telling structural states in proteins by removing conformational modulator segments added during evolution. In addition to identifying structural modifications that were critical to the emergence of diverged proteins, investigating this novel mode of evolution will help us to understand the origin and functional potential of protein structures.  相似文献   

14.
Similarity of protein structures has been analyzed using three-dimensional Delaunay triangulation patterns derived from the backbone representation. It has been found that structurally related proteins have a common spatial invariant part, a set of tetrahedrons, mathematically described as a common spatial subgraph volume of the three-dimensional contact graph derived from Delaunay tessellation (DT). Based on this property of protein structures, we present a novel common volume superimposition (TOPOFIT) method to produce structural alignments. Structural alignments usually evaluated by a number of equivalent (aligned) positions (N(e)) with corresponding root mean square deviation (RMSD). The superimposition of the DT patterns allows one to uniquely identify a maximal common number of equivalent residues in the structural alignment. In other words, TOPOFIT identifies a feature point on the RMSD N(e) curve, a topomax point, until which the topologies of two structures correspond to each other, including backbone and interresidue contacts, whereas the growing number of mismatches between the DT patterns occurs at larger RMSD (N(e)) after the topomax point. It has been found that the topomax point is present in all alignments from different protein structural classes; therefore, the TOPOFIT method identifies common, invariant structural parts between proteins. The alignments produced by the TOPOFIT method have a good correlation with alignments produced by other current methods. This novel method opens new opportunities for the comparative analysis of protein structures and for more detailed studies on understanding the molecular principles of tertiary structure organization and functionality. The TOPOFIT method also helps to detect conformational changes, topological differences in variable parts, which are particularly important for studies of variations in active/ binding sites and protein classification.  相似文献   

15.
Proteins are composed of evolutionary units called domains; the majority of proteins consist of at least two domains. These domains and nature of their interactions determine the function of the protein. The roles that combinations of domains play in the formation of the protein repertoire have been found by analysis of domain assignments to genome sequences. Additional findings on the geometry of domains have been gained from examination of three-dimensional protein structures. Future work will require a domain-centric functional classification scheme and efforts to determine structures of domain combinations.  相似文献   

16.
R M Sweet 《Biopolymers》1986,25(8):1565-1577
Short segments of polypeptide, from a protein for which the primary sequence but not the three-dimensional structure is known, are compared to a library of known structures. The basis of comparison is the probability with which residues in the unknown segment might substitute through evolution for residues in segments of known structure. In test cases, segments from known structures that are similar in sequence to those from a protein treated as unknown are often found to be similar in three-dimensional structure to one another and to the true structure of the “unknown” segment. This provides a basis for prediction of the local configuration (secondary structure) of polypeptides.  相似文献   

17.
Shestopalov BV 《Tsitologiia》2003,45(7):702-706
The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three-dimensional structure from the amino acid sequence, and the calculated secondary structure and codon strength distribution can be used for simulating the next step of protein folding; b) one can propose that the same secondary structures can be folded into different tertiary structures and, vice versa, different secondary structures can be folded into the same tertiary structures, provided codon distributions are considered also; c) codons can be considered as first elements of protein three-dimensional structure language.  相似文献   

18.
19.
Zemla A 《Nucleic acids research》2003,31(13):3370-3374
We present the LGA (Local-Global Alignment) method, designed to facilitate the comparison of protein structures or fragments of protein structures in sequence dependent and sequence independent modes. The LGA structure alignment program is available as an online service at http://PredictionCenter.llnl.gov/local/lga. Data generated by LGA can be successfully used in a scoring function to rank the level of similarity between two structures and to allow structure classification when many proteins are being analyzed. LGA also allows the clustering of similar fragments of protein structures.  相似文献   

20.
To study local structures in proteins, we previously developed an autoassociative artificial neural network (autoANN) and clustering tool to discover intrinsic features of macromolecular structures. The hidden unit activations computed by the trained autoANN are a convenient low-dimensional encoding of the local protein backbone structure. Clustering these activation vectors results in a unique classification of protein local structural features called Structural Building Blocks (SBBs). Here we describe application of this method to a larger database of proteins, verification of the applicability of this method to structure classification, and subsequent analysis of amino acid frequencies and several commonly occurring patterns of SBBs. The SBB classification method has several interesting properties: 1) it identifies the regular secondary structures, α helix and β strand; 2) it consistently identifies other local structure features (e.g., helix caps and strand caps); 3) strong amino acid preferences are revealed at some positions in some SBBs; and 4) distinct patterns of SBBs occur in the “random coil” regions of proteins. Analysis of these patterns identifies interesting structural motifs in the protein backbone structure, indicating that SBBs can be used as “building blocks” in the analysis of protein structure. This type of pattern analysis should increase our understanding of the relationship between protein sequence and local structure, especially in the prediction of protein structures. © 1997 Wiley-Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号