首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
2.
CORA is a suite of programs for multiply aligning and analyzing protein structural families to identify the consensus positions and capture their most conserved structural characteristics (e.g., residue accessibility, torsional angles, and global geometry as described by inter-residue vectors/contacts). Knowledge of these structurally conserved positions, which are mostly in the core of the fold and of their properties, significantly improves the identification and classification of newly-determined relatives. Information is encoded in a consensus three-dimensional (3D) template and relatives found by a sensitive alignment method, which employs a new scoring scheme based on conserved residue contacts. By encapsulating these critical "core" features, templates perform more reliably in recognizing distant structural relatives than searches with representative structures. Parameters for 3D-template generation and alignment were optimized for each structural class (mainly-alpha, mainly-beta, alpha-beta), using representative superfold families. For all families selected, the templates gave significant improvements in sensitivity and selectivity in recognizing distant structural relatives. Furthermore, since templates contain less than 70% of fold positions and compare fewer positions when aligning structures, scans are at least an order of magnitude faster than scans using selected structures. CORA was subsequently tested on eight other broad structural families from the CATH database. Diagnostics plots are generated automatically and provide qualitative assistance for classifying newly determined relatives. They are demonstrated here by application to the large globin-like fold family. CORA templates for both homologous superfamilies and fold families will be stored in CATH and used to improve the classification and analysis of newly determined structures.  相似文献   

3.
4.
5.
We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.  相似文献   

6.
This paper describes a novel computer graphics tool for predicting protein structures. The method is based on structural profiles; which are plots of hydrophobicity, parameters used for secondary structure prediction, or other residue-specific traits against sequence number. Similar structural profiles can indicate similar tertiary structures, in the absence of sequence homology. The profiles of reference proteins, with known structure, can be used for prediction. In the method presented here, structural profiles are compared by interactive computer graphics, using the program Multiplot. As a test, a structural profile comparison of several proteins known to have similar 3D structures is presented. Comparison of structural profiles detects similar folding of the two domains of rhodanese, which was not easily detected by sequence homology.  相似文献   

7.
MOTIVATION: A major goal in structural genomics is to enrich the catalogue of proteins whose 3D structures are known. In an attempt to address this problem we mapped over 10 000 proteins with solved structures onto a graph of all Swissprot protein sequences (release 36, approximately 73 000 proteins) provided by ProtoMap, with the goal of sorting proteins according to their likelihood of belonging to new superfamilies. We hypothesized that proteins within neighbouring clusters tend to share common structural superfamilies or folds. If true, the likelihood of finding new superfamilies increases in clusters that are distal from other solved structures within the graph. RESULTS: We defined an order relation between unsolved proteins according to their 'distance' from solved structures in the graph, and sorted approximately 48 000 proteins. Our list can be partitioned into three groups: approximately 35 000 proteins sharing a cluster with at least one known structure; approximately 6500 proteins in clusters with no solved structure but with neighbouring clusters containing known structures; and a third group contains the rest of the proteins, approximately 6100 (in 1274 clusters). We tested the quality of the order relation using thousands of recently solved structures that were not included when the order was defined. The tests show that our order is significantly better (P-value approximately 10(5)) than a random order. More interestingly, the order within the union of the second and third groups, and the order within the third group alone, perform better than random (P-values: 0.0008 and 0.15, respectively) and are better than alternative orders created using PSI-BLAST. Herein, we present a method for selecting targets to be used in structural genomics projects. AVAILABILITY: List of proteins to be used for targets selection combined with a set of biological filters for narrowing down potential targets is in http://www.protarget.cs.huji.ac.il.  相似文献   

8.
9.
10.
MOTIVATION: Membrane domain prediction has recently been re-evaluated by several groups, suggesting that the accuracy of existing methods is still rather limited. In this work, we revisit this problem and propose novel methods for prediction of alpha-helical as well as beta-sheet transmembrane (TM) domains. The new approach is based on a compact representation of an amino acid residue and its environment, which consists of predicted solvent accessibility and secondary structure of each amino acid. A recently introduced method for solvent accessibility prediction trained on a set of soluble proteins is used here to indicate segments of residues that are predicted not to be accessible to water and, therefore, may be 'buried' in the membrane. While evolutionary profiles in the form of a multiple alignment are used to derive these simple 'structural profiles', they are not used explicitly for the membrane domain prediction and the overall number of parameters in the model is significantly reduced. This offers the possibility of a more reliable estimation of the free parameters in the model with a limited number of experimentally resolved membrane protein structures. RESULTS: Using cross-validated training on available sets of structurally resolved and non-redundant alpha and beta membrane proteins, we demonstrate that membrane domain prediction methods based on such a compact representation outperform approaches that utilize explicitly evolutionary profiles and multiple alignments. Moreover, using an external evaluation by the TMH Benchmark server we show that our final prediction protocol for the TM helix prediction is competitive with the state-of-the-art methods, achieving per-residue accuracy of approximately 89% and per-segment accuracy of approximately 80% on the set of high resolution structures used by the TMH Benchmark server. At the same time the observed rates of confusion with signal peptides and globular proteins are the lowest among the tested methods. The new method is available online at http://minnou.cchmc.org.  相似文献   

11.
Wolff K  Vendruscolo M  Porto M 《Gene》2008,422(1-2):47-51
We discuss a computational approach for reconstructing the native structures of proteins from the knowledge of a structural profile - the first eigenvector of the contact map of the native structure itself. The procedure consists in carrying out Monte Carlo simulations of a tube model of the protein structure with an energy bias towards the target structural profile. We present the reconstruction of two small proteins and address problems arising in the reconstruction of larger proteins. Our results indicate that an accurate physico-chemical energy function should be used in conjunction with the structural profile bias in order to achieve accurate reconstructions.  相似文献   

12.
13.
The roughness of the protein energy surface poses a significant challenge to search algorithms that seek to obtain a structural characterization of the native state. Recent research seeks to bias search toward near-native conformations through one-dimensional structural profiles of the protein native state. Here we investigate the effectiveness of such profiles in a structure prediction setting for proteins of various sizes and folds. We pursue two directions. We first investigate the contribution of structural profiles in comparison to or in conjunction with physics-based energy functions in providing an effective energy bias. We conduct this investigation in the context of Metropolis Monte Carlo with fragment-based assembly. Second, we explore the effectiveness of structural profiles in providing projection coordinates through which to organize the conformational space. We do so in the context of a robotics-inspired search framework proposed in our lab that employs projections of the conformational space to guide search. Our findings indicate that structural profiles are most effective in obtaining physically realistic near-native conformations when employed in conjunction with physics-based energy functions. Our findings also show that these profiles are very effective when employed instead as projection coordinates to guide probabilistic search toward undersampled regions of the conformational space.  相似文献   

14.
15.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

16.
SkyLine, a high-throughput homology modeling pipeline tool, detects and models true sequence homologs to a given protein structure. Structures and models are stored in SkyBase with links to computational function annotation, as calculated by MarkUs. The SkyLine/SkyBase/MarkUs technology represents a novel structure-based approach that is more objective and versatile than other protein classification resources. This structure-centric strategy provides a multi-dimensional organization and coverage of protein space at the levels of family, function, and genome. The concept of “modelability”, the ability to model sequences on related structures, provides a reliable criterion for membership in a protein family (“leverage”) and underlies the unique success of this approach. The overall procedure is illustrated by its application to START domains, which comprise a Biomedical Theme for the Northeast Structural Genomics Consortium as part of the Protein Structure Initiative. START domains are typically involved in the non-vesicular transport of lipids. While 19 experimentally determined structures are available, the family, whose evolutionary hierarchy is not well determined, is highly sequence diverse, and the ligand-binding potential of many family members is unknown. The SkyLine/SkyBase/MarkUs approach provides significant insights and predicts: (1) many more family members (~4,000) than any other resource; (2) the function for a large number of unannotated proteins; (3) instances of START domains in genomes from which they were thought to be absent; and (4) the existence of two types of novel proteins, those containing dual START domain and those containing N-terminal START domains.  相似文献   

17.
Membrane protein structural biology is a rapidly developing field with fundamental importance for elucidating key biological and biophysical processes including signal transduction, intercellular communication, and cellular transport. In addition to the intrinsic interest in this area of research, structural studies of membrane proteins have direct significance on the development of therapeutics that impact human health in diverse and important ways. In this article we demonstrate the potential of investigating the structure of membrane proteins using the reverse micelle forming surfactant dioctyl sulfosuccinate (AOT) in application to the prototypical model ion channel gramicidin A. Reverse micelles are surfactant based nanoparticles which have been employed to investigate fundamental physical properties of biomolecules. The results of this solution NMR based study indicate that the AOT reverse micelle system is capable of refolding and stabilizing relatively high concentrations of the native conformation of gramicidin A. Importantly, pulsed-field-gradient NMR diffusion and NOESY experiments reveal stable gramicidin A homodimer interactions that bridge reverse micelle particles. The spectroscopic benefit of reverse micelle-membrane protein solubilization is also explored, and significant enhancement over commonly used micelle based mimetic systems is demonstrated. These results establish the effectiveness of reverse micelle based studies of membrane proteins, and illustrate that membrane proteins solubilized by reverse micelles are compatible with high resolution solution NMR techniques. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

18.
MOTIVATION: Protein families can be defined based on structure or sequence similarity. We wanted to compare two protein family databases, one based on structural and one on sequence similarity, to investigate to what extent they overlap, the similarity in definition of corresponding families, and to create a list of large protein families with unknown structure as a resource for structural genomics. We also wanted to increase the sensitivity of fold assignment by exploiting protein family HMMs. RESULTS: We compared Pfam, a protein family database based on sequence similarity, to Scop, which is based on structural similarity. We found that 70% of the Scop families exist in Pfam while 57% of the Pfam families exist in Scop. Most families that occur in both databases correspond well to each other, but in some cases they are different. Such cases highlight situations in which structure and sequence approaches differ significantly. The comparison enabled us to compile a list of the largest families that do not occur in Scop; these are suitable targets for structure prediction and determination, and may be useful to guide projects in structural genomics. It can be noted that 13 out of the 20 largest protein families without a known structure are likely transmembrane proteins. We also exploited Pfam to increase the sensitivity of detecting homologs of proteins with known structure, by comparing query sequences to Pfam HMMs that correspond to Scop families. For SWISSPROT+TREMBL, this yielded an increase in fold assignment from 31% to 42% compared to using FASTA only. This method assigned a structure to 22% of the proteins in Saccharomyces cerevisiae, 24% in Escherichia coli, and 16% in Methanococcus jannaschii.  相似文献   

19.
Yeast is widely used to determine the tertiary structure of eukaryotic proteins, because of its ability to undergo post-translational modifications such as glycosylation. A mutant lacking S -adenosylmethionine synthesis has been reported as a suitable host for producing selenomethionine derivatives, which can help solve phase problems in protein crystallography. However, the mutant required external addition of S -adenosylmethionine for cell proliferation. Here, a selenomethionine-resistant Pichia pastoris mutant that showed S -adenosylmethionine autotrophy was isolated. Human lysozyme expressed by the mutant under the control of constitutive promoter contained selenomethionine at 65% occupancy, sufficient for use as a selenomethionine derivative for single-wavelength anomalous dispersion phasing.  相似文献   

20.
Liu T  Geng X  Zheng X  Li R  Wang J 《Amino acids》2012,42(6):2243-2249
Computational prediction of protein structural class based solely on sequence data remains a challenging problem in protein science. Existing methods differ in the protein sequence representation models and prediction engines adopted. In this study, a powerful feature extraction method, which combines position-specific score matrix (PSSM) with auto covariance (AC) transformation, is introduced. Thus, a sample protein is represented by a series of discrete components, which could partially incorporate the long-range sequence order information and evolutionary information reflected from the PSI-BLAST profile. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides the state-of-the-art performance for structural class prediction. A Web server that implements the proposed method is freely available at http://202.194.133.5/xinxi/AAC_PSSM_AC/index.htm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号