Within the ever-expanding repertoire of known protein sequences and structures, many examples of evolving three-dimensional structures are emerging that illustrate the plasticity and robustness of protein folds. The mechanisms by which protein folds change often include the fusion of duplicated domains, followed by divergence through mutation. Such changes reflect both the stability of protein folds and the requirements of protein function. 相似文献
We present a language for describing structural patterns of residues in protein structures and a method for the discovery of such patterns that recur in a set of protein structures. The patterns impose restrictions on the spatial position of each residue, their order along the amino acid chain, and which amino acids are allowed in each position. Unlike other methods for comparing sets of protein structures, our method is not based on the use of pairwise structure comparisons which is often time consuming and can produce inconsistent results. Instead, the method simultaneously takes into account information from all structures in the search for conserved structure patterns which are potential structure motifs. The method is based on describing the spatial neighborhoods of each residue in each structure as a string and applying a sequence pattern discovery method to find patterns common to subsets of these strings. Finally it is checked whether the similarities between the neighborhood strings correspond to spatially similar substructures. We apply the method to analyze sets of very disparate proteins from the four different protein families: serine proteases, cuprodoxins, cysteine proteinases, and ferredoxins. The motifs found by the method correspond well to the site and motif information given in the annotation of these proteins in PDB, Swiss-Prot, and PROSITE. Furthermore, the motifs are confirmed by using the motif data to constrain the structural alignment of the proteins obtained with the program SAP. This gave the best superposition/alignment of the proteins given the motif assignment. 相似文献
General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five-residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al., Proteins 2000;41:271-287), secondary structure states, and solvent accessibilities. On the basis of the native sequences of the structurally clustered fragments, the probabilities of different amino acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database-matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudosequence design, and local structure predictions, the potentials performed at least comparably and, in most cases, better than a number of existing models applicable to the same contexts indicating the advantages of such an integrated approach for deriving local potentials and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions. 相似文献
We present an alternative to the classical Ramachandran plot (R-plot) to display local protein backbone structure. Instead of the (ϕ, ψ)-backbone angles relating to the chemical architecture of polypeptides generic helical parameters are used. These are the rotation or twist angle ϑ and the helical rise parameter d. Plots with these parameters provide a different view on the nature of local protein backbone structures. It allows to display the local structures in polar (d, ϑ)-coordinates, which is not possible for an R-plot, where structural regimes connected by periodicity appear disconnected. But there are other advantages, like a clear discrimination of the handedness of a local structure, a larger spread of the different local structure domains—the latter can yield a better separation of different local secondary structure motives—and many more. Compared to the R-plot we are not aware of any major disadvantage to classify local polypeptide structures with the (d, ϑ)-plot, except that it requires some elementary computations. To facilitate usage of the new (d, ϑ)-plot for protein structures we provide a web application (http://agknapp.chemie.fu-berlin.de/secsass), which shows the (d, ϑ)-plot side-by-side with the R-plot. 相似文献
In recent years, protein structure prediction using local structure information has made great progress. In this study, a novel and effective method is developed to predict the local structure and the folding fragments of proteins. First, the proteins with known structures are split into fragments. Second, these fragments, represented by dihedrals, are clustered to produce the building blocks (BBs). Third, an efficient machine learning method is used to predict the local structures of proteins from sequence profiles. Finally, a bi-gram model, trained by an iterated algorithm, is introduced to simulate the interactions of these BBs. For test proteins, the building-block lattice is constructed, which contains all the folding fragments of the proteins. The local structures and the optimal fragments are then obtained by the dynamic programming algorithm. The experiment is performed on a subset of the PDB database with sequence identity less than 25%. The results show that the performance of the method is better than the method that uses only sequence information. When multiple paths are returned, the average classification accuracy of local structures is 72.27% and the average prediction accuracy of local structures is 67.72%, which is a significant improvement in comparison with previous studies. The method can predict not only the local structures but also the folding fragments of proteins. This work is helpful for the ab initio protein structure prediction and especially, the understanding of the folding process of proteins. 相似文献
It is generally assumed that CTCF-binding sites are synonymous with the demarcation of expression domains by promoting the formation of chromatin loops. We have proposed earlier, however, that such features may be context-dependent. In support of this notion, we show here that chromatin loop structures, impinging on CTCF-binding sites 1/2 and 3/4 at the 5′ and 3′-ends, respectively, within the maternal allele of the H19 imprinting control region (ICR), differ significantly. Although abrogation of CTCF binding to the maternal H19 ICR allele results in loss of chromatin loops in the 3′-region, there is a dramatic gain of long-range chromatin loops impinging on the 5′-region. As the degree of occupancy of its four CTCF-binding sites discriminates between the chromatin insulator and replication timing functions, we submit that the CTCF-binding sites within the H19 ICR are functionally diverse and organize context-dependent higher order chromatin conformations. 相似文献
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%. 相似文献
Torsion angle alignment (TALI) is a novel approach to local structural motif alignment, based on backbone torsion angles (phi, psi) rather than the more traditional atomic distance matrices. Representation of a protein structure in the form of a sequence of torsion angles enables easy integration of sequence and structural information, and adopts mature techniques in sequence alignment to improve performance and alignment quality. We show that TALI is able to match local structural motifs as well as identify global structural similarity. TALI is also compared to other structure alignment methods such as DALI, CE, and SSM, as well as sequence alignment based on PSI-BLAST; TALI is shown to be equally successful as, or more successful than, these other methods when applied to challenging structural alignments. The inference of the evolutionary tree of class II aminoacyl-tRNA synthetase shows the potential for TALI in estimating protein structural evolution and in identifying structural divergence among homologous structures. Availability: http://redcat.cse.sc.edu/index.php/Project:TALI/. 相似文献
One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.
Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Vorono? tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions. 相似文献
We have demonstrated earlier that protein microenvironments were conserved around disulfide‐bridged cystine motifs with similar functions, irrespective of diversity in protein sequences. Here, cysteine thiol modifications were characterized based on protein microenvironments, secondary structures and specific protein functions. Protein microenvironment around an amino acid was defined as the summation of hydrophobic contributions from the surrounding protein fragments and the solvent molecules present within its first contact shell. Cysteine functions (modifications) were grouped into enzymatic and non‐enzymatic classes. Modifications studied were—disulfide formation, thio‐ether formation, metal‐binding, nitrosylation, acylation, selenylation, glutathionylation, sulfenylation, and ribosylation. 1079 enzymatic proteins were reported from high‐resolution crystal structures. Protein microenvironments around cysteine thiol, derived from above crystal structures, were clustered into 3 groups—buried‐hydrophobic, intermediate and exposed‐hydrophilic clusters. Characterization of cysteine functions were statistically meaningful for 4 modifications (disulfide formation, thioether formation, sulfenylation, and iron/zinc binding) those have sufficient amount of data in the current dataset. Results showed that protein microenvironment, secondary structure and protein functions were conserved for enzymatic cysteine functions, in contrast to the same function from non‐enzymatic cysteines. Disulfide forming enzymatic cysteines were tightly packed within intermediate protein microenvironment cluster, have alpha‐helical conformation and mostly belonged to CxxC motif of electron transport proteins. Disulfide forming non‐enzymatic cysteines did not belong to conserved motif and have variable secondary structures. Similarly, enzymatic thioether forming cysteines have conserved microenvironment compared to non‐enzymatic cystienes. Based on the compatibility between protein microenvironment and cysteine modifications, more efficient drug molecules could be designed against cysteine‐related diseases. 相似文献
Discretization of protein conformational space and fragment assembly methods simplify the search of native structures. These methods, mostly of Monte Carlo and genetic-type, do not exploit, however, the fact that short fragments describing consecutive parts of proteins are conformation-dependent. Yet, this information should be useful in improving ab initio and comparative protein structure modeling. In a preliminary study, we have assessed the possibility of using greedy algorithms for protein structure reconstruction based on the assembly of fragments of four-residue length. Greedy algorithms differ from Monte Carlo and genetic approaches in that they grow a polypeptide chain one fragment after another. Here, we move one step further in complexity, and provide strong evidence that the dependence between consecutive local conformations during assembly makes possible the reconstruction of protein structures from their secondary structures using a Go potential. Overall our procedure can reproduce 20 protein structures of 50-164 amino acids within 2.7 to 6.5 A RMSd and is able to identify native topologies for all proteins, although some targets are stabilized by very long-range interactions. 相似文献
Surprisingly, cryosolvents may mimic the effects of ionic solutes on the structures and functions of macromolecular assemblages, showing additive or opposite effects depending on the respective concentrations. These interactive effects are hard to analyze precisely because they may result from so many possible contributions. However, studies on model systems clearly show the nature of the interactive effects and bring about useful information concerning the mechanism of action of cryosolvents and ions and the response of enzyme systems. Such results suggest new studies on the interaction of biopolymers and water and their possible impact on the cryobehavior of highly organized and living systems. 相似文献
A new approach to the functional classification of protein 3D structures is described with application to some examples from structural genomics. This approach is based on functional site prediction with THEMATICS and POOL. THEMATICS employs calculated electrostatic potentials of the query structure. POOL is a machine learning method that utilizes THEMATICS features and has been shown to predict accurate, precise, highly localized interaction sites. Extension to the functional classification of structural genomics proteins is now described. Predicted functionally important residues are structurally aligned with those of proteins with previously characterized biochemical functions. A 3D structure match at the predicted local functional site then serves as a more reliable predictor of biochemical function than an overall structure match. Annotation is confirmed for a structural genomics protein with the ribulose phosphate binding barrel (RPBB) fold. A putative glucoamylase from Bacteroides fragilis (PDB ID 3eu8) is shown to be in fact probably not a glucoamylase. Finally a structural genomics protein from Streptomyces coelicolor annotated as an enoyl-CoA hydratase (PDB ID 3g64) is shown to be misannotated. Its predicted active site does not match the well-characterized enoyl-CoA hydratases of similar structure but rather bears closer resemblance to those of a dehalogenase with similar fold. 相似文献