首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The quality of three-dimensional homology models derived from protein sequences provides an independent measure of the suitability of a protein sequence for a certain fold. We have used automated homology modeling and model assessment tools to identify putative nuclear hormone receptor ligand-binding domains in the genome of Caenorhabditis elegans. Our results indicate that the availability of multiple crystal structures is crucial to obtaining useful models in this receptor family. The majority of annotated mammalian nuclear hormone receptors could be assigned to a ligand-binding domain fold by using the best model derived from any of four template structures. This strategy also assigned the ligand-binding domain fold to a number of C.elegans. sequences without prior annotation. Interestingly, the retinoic acid receptor crystal structure contributed most to the number of sequences that could be assigned to a ligand-binding domain fold. Several causes for this can be suggested, including the high quality of this protein structure in terms of our assessment tools, similarity between the biological function or ligand of this receptor and the modeled genes and gene duplication in C.elegans.  相似文献   

2.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

3.
The three-dimensional structure of a protein molecule appears to depend on the amino acid sequence of the protein in an as yet incompletely described manner. If the amino acid sequence is replaced by a numerical sequence of values representing a physical or chemical property of amino acids, the resulting numerical sequence is amenable to autocorrelation analysis. Further, if certain geometrical parameters are calculated from the three-dimensional structure of a protein to form a configurational series, pairs of property series and configurational series can be analyzed by cross-correlation techniques. The data base for the analysis was the three-dimensional structures of ten proteins as determined by X-ray crystallography. Such analysis yields the result that the hydrophobicity of an amino acid residue in a protein influences the orientation angle of the amino acid side chain. This result is consistent with the widely current “oil-drop” model of protein structure. Hydrophobicity also appears to influence the backbone dihedral angle φ, but not ψ Such a directional effect cannot be explained by a current model of information transfer in protein helices. The magnitude of the cross correlations does not appear to be satisfactory for construction of a transfer function model for the prediction of general features of protein structure from amino acid sequences.  相似文献   

4.
Two recently published but independently derived structures, namely the X-ray crystallographic structure of ribosomal protein S7 and the "binding pocket" for this protein in a three-dimensional model of the 16S rRNA, have been correlated with one another. The known rRNA-protein interactions for S7 include a minimum binding site, a number of footprint sites, and two RNA-protein crosslink sites on the 16S rRNA, all of which form a compact group in the published 16S rRNA model (despite the fact that these interactions were not used as primary modeling constraints in building that model). The amino acids in protein S7 that are involved in the two crosslinks to 16S rRNA have also been determined in previous studies, and here we have used these sites to orient the crystallographic structure of S7 relative to its rRNA binding pocket. Some minor alterations were made to the rRNA model to improve the fit. In the resulting structure, the principal positively charged surface of the protein is in contact with the 16S rRNA, and all of the RNA-protein interaction data are satisfied. The quality of the fit gives added confidence as to the validity of the 16S rRNA model. Protein S7 is furthermore known to be crosslinked both to P site-bound tRNA and to mRNA at positions upstream of the P site codon; the matched S7-16S rRNA structure makes a prediction as to the location of this crosslink site within the protein molecule.  相似文献   

5.
Prediction of protein residue contacts with a PDB-derived likelihood matrix   总被引:8,自引:0,他引:8  
Proteins with similar folds often display common patterns of residue variability. A widely discussed question is how these patterns can be identified and deconvoluted to predict protein structure. In this respect, correlated mutation analysis (CMA) has shown considerable promise. CMA compares multiple members of a protein family and detects residues that remain constant or mutate in tandem. Often this behavior points to structural or functional interdependence between residues. CMA has been used to predict pairs of amino acids that are distant in the primary sequence but likely to form close contacts in the native three-dimensional structure. Until now these methods have used evolutionary or biophysical models to score the fit between residues. We wished to test whether empirical methods, derived from known protein structures, would provide useful predictive power for CMA. We analyzed 672 known protein structures, derived contact likelihood scores for all possible amino acid pairs, and used these scores to predict contacts. We then tested the method on 118 different protein families for which structures have been solved to atomic resolution. The mean performance was almost seven times better than random prediction. Used in concert with secondary structure prediction, the new CMA method could supply restraints for predicting still undetermined structures.  相似文献   

6.
Two methods of qualitative analysis of sequence distribution in DNA and protein are presented. The first method is based on the finding that the frequency of occurrence of each nucleotide in a defined sequence with functional significance more or less deviates from uniform distribution. The deviation found in this defined sequence seems to parallel the function of this sequence. In the second method, two model compounds (trypsin and its inhibitor) have been used to see the topological fit between their local structures. Acrophilicity parameter for amino acid was used to construct the topological structure. Both methods may find practical application in algorithms to design functional DNA and protein molecules.  相似文献   

7.
Shestopalov BV 《Tsitologiia》2003,45(7):707-713
In the previous paper (Shestopalov, 2003) we presented the amino acid code of protein secondary structure as a partial solution of the fundamental problem of the protein three-dimensional structure calculation from the amino acid sequence. Here a statistical model of the code is described. The model is based on the structural data from 2258 protein chains (417,112 amino acid residues used). 60 and 61% of the secondary structure, calculated using the model, coincide, respectively, with the observed secondary structure in the training subset and test subset (104 protein chains and 21,166 residues used). This is equal to the threshold value for all the secondary structure calculations, based on the models, where, similarly as here, only the nearest and middle-range interactions are considered. Therefore the constructed model can be applied for the protein structure prediction from the amino acid sequence, especially when additional information is used along with expert analysis, as in the most successful prediction methods. The model can be used for analysis of the secondary structure changes during protein folding by comparison of the calculated and observed secondary structures. The information about the conformationally invariant segments can serve for the simulation of the supersecondary structure formation. One can try to obtain and examine the protein subset, in which the calculated and observed secondary structures are very similar.  相似文献   

8.
L B Hendry  L W Roach  V B Mahesh 《Steroids》1999,64(9):570-575
A novel computational technology derived from gene structure has been developed for screening, selecting, and designing pharmaceutical candidates. Pharmacophores, or three-dimensional molecular blueprints, were created by docking known active structures into specific sites in partially unwound DNA. The pharmacophores are composites of the van der Waals surfaces and hydrogen bonding functional groups of active molecules. Once created, molecules can be inserted into the pharmacophores and degree of fit quantitated by the volume of the molecule that fits within the composite surface and the magnitude of electrostatic interactions with charged atoms on the pharmacophore. Here, we describe endocrine pharmacophores and in particular the estrogen pharmacophore derived by docking active ligands into partially unwound DNA. Fit of candidate structures into the estrogen pharmacophore correlated with estrogenic (uterotropic) activity. For example, the super active estrogens moxestrol and 11beta-acetoxyestradiol fit better within the site than estradiol. Bisphenol A, a putative endocrine disrupter with suspected estrogenic activity, was a poor fit in the pharmacophore. Consistent with this prediction, bisphenol A was recently shown to lack uterotropic activity. The capacity of the endocrine pharmacophores to predict certain nontarget activities was demonstrated by using the antiandrogen cyproterone acetate that did not fit the estrogen or thyroid pharmacophores but fit partially into the progestin and glucocorticoid pharmacophores. Cyproterone acetate has been reported to have weak progestational and glucocorticoid activities. The pharmacophores provide for the first time a multidimensional computational method that can simultaneously predict multiple activities of diverse molecular structures.  相似文献   

9.
Wide-angle X-ray solution scattering (WAXS) patterns contain substantial information about the three-dimensional structure of a protein. Although WAXS data have far less information than is required for determination of a full three-dimensional structure, the actual amount of information contained in a WAXS pattern has not been carefully quantified. Here we carry out an analysis of the amount of information that can be extracted from a WAXS pattern and demonstrate that it is adequate to estimate the secondary-structure content of a protein and to strongly limit its possible tertiary structures. WAXS patterns computed from the atomic coordinates of a set of 498 protein domains representing all of known fold space were used as the basis for constructing a multidimensional space of all corresponding WAXS patterns (‘WAXS space’). Within WAXS space, each scattering pattern is represented by a single vector. A principal components analysis was carried out to identify those directions in WAXS space that provide the greatest discrimination among patterns. The number of dimensions that provide significant discrimination among protein folds agrees well with the number of independent parameters estimated from a naïve Shannon sampling theorem approach. Estimates of the relative abundances of secondary structures were made using training/test sets derived from this data set. The average error in the estimate of α-helical content was 11%, and of β-sheet content was 9%. The distribution of proteins that are members of the four structure classes, α, β, α/β and α+β, are well separated in WAXS space when data extending to a spacing of 2.2 Å are used. Quantification of the information embedded within a WAXS pattern indicates that these data can be used as a powerful constraint in homology modeling of protein structures.  相似文献   

10.
Shestopalov BV 《Tsitologiia》2003,45(7):702-706
The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three-dimensional structure from the amino acid sequence, and the calculated secondary structure and codon strength distribution can be used for simulating the next step of protein folding; b) one can propose that the same secondary structures can be folded into different tertiary structures and, vice versa, different secondary structures can be folded into the same tertiary structures, provided codon distributions are considered also; c) codons can be considered as first elements of protein three-dimensional structure language.  相似文献   

11.
Despite the increasing number of published protein structures, and the fact that each protein's function relies on its three-dimensional structure, there is limited access to automatic programs used for the identification of critical residues from the protein structure, compared with those based on protein sequence. Here we present a new algorithm based on network analysis applied exclusively on protein structures to identify critical residues. Our results show that this method identifies critical residues for protein function with high reliability and improves automatic sequence-based approaches and previous network-based approaches. The reliability of the method depends on the conformational diversity screened for the protein of interest. We have designed a web site to give access to this software at http://bis.ifc.unam.mx/jamming/. In summary, a new method is presented that relates critical residues for protein function with the most traversed residues in networks derived from protein structures. A unique feature of the method is the inclusion of the conformational diversity of proteins in the prediction, thus reproducing a basic feature of the structure/function relationship of proteins.  相似文献   

12.
A low-resolution scoring function for the selection of native and near-native structures from a set of predicted structures for a given protein sequence has been developed. The scoring function, ProVal (Protein Validate), used several variables that describe an aspect of protein structure for which the proximity to the native structure can be assessed quantitatively. Among the parameters included are a packing estimate, surface areas, and the contact order. A partial least squares for latent variables (PLS) model was built for each candidate set of the 28 decoy sets of structures generated for 22 different proteins using the described parameters as independent variables. The C(alpha) RMS of the candidate structures versus the experimental structure was used as the dependent variable. The final generalized scoring function was an average of all models derived, ensuring that the function was not optimized for specific fold classes or method of structure generation of the candidate folds. The results show that the crystal structure was scored best in 64% of the 28 test sets and was clearly separated from the decoys in many examples. In all the other cases in which the crystal structure did not rank first, it ranked within the top 10%. Thus, although ProVal could not distinguish between predicted structures that were similar overall in fold quality due to its inherently low resolution, it can clearly be used as a primary filter to eliminate approximately 90% of fold candidates generated by current prediction methods from all-atom modeling and further evaluation. The correlation between the predicted and actual C(alpha) RMS values varies considerably between the candidate fold sets.  相似文献   

13.
Contact potential that recognizes the correct folding of globular proteins.   总被引:29,自引:0,他引:29  
We have devised a continuous function of interresidue contacts in globular proteins such that the X-ray crystal structure has a lower function value than that of thousands of protein-like alternative conformations. Although we fit the adjustable parameters of the potential using only 10,000 alternative structures for a selected training set of 37 proteins, a grand total of 530,000 constraints was satisfied, derived from 73 proteins and their numerous alternative conformations. In every case where the native conformation is adequately globular and compact, according to objective criteria we have developed, the potential function always favors the native over all alternatives by a substantial margin. This is true even for an additional three proteins never used in any way in the fitting procedure. Conformations differing only slightly from the native, such as those coming from crystal structures of the same protein complexed with different ligands or from crystal structures of point mutants, have function values very similar to the native's and always less than those of alternatives derived from substantially different crystal structures. This holds for all 95 structures that are homologous to one or another of various proteins we used. Realizing that this potential should be useful for modeling the conformation of new protein sequences from the body of protein crystal structures, we suggest a test for deciding whether a nearly correct approximation to the native conformation has been found.  相似文献   

14.
Prediction of protein structure depends on the accuracy and complexity of the models used. Here, we represent the polypeptide chain by a sequence of rigid fragments that are concatenated without any degrees of freedom. Fragments chosen from a library of representative fragments are fit to the native structure using a greedy build-up method. This gives a one-dimensional representation of native protein three-dimensional structure whose quality depends on the nature of the library. We use a novel clustering method to construct libraries that differ in the fragment length (four to seven residues) and number of representative fragments they contain (25-300). Each library is characterized by the quality of fit (accuracy) and the number of allowed states per residue (complexity). We find that the accuracy depends on the complexity and varies from 2.9A for a 2.7-state model on the basis of fragments of length 7-0.76A for a 15-state model on the basis of fragments of length 5. Our goal is to find representations that are both accurate and economical (low complexity). The models defined here are substantially better in this regard: with ten states per residue we approximate native protein structure to 1A compared to over 20 states per residue needed previously.For the same complexity, we find that longer fragments provide better fits. Unfortunately, libraries of longer fragments must be much larger (for ten states per residue, a seven-residue library is 100 times larger than a five-residue library). As the number of known protein native structures increases, it will be possible to construct larger libraries to better exploit this correlation between neighboring residues. Our fragment libraries, which offer a wide range of optimal fragments suited to different accuracies of fit, may prove to be useful for generating better decoy sets for ab initio protein folding and for generating accurate loop conformations in homology modeling.  相似文献   

15.
We describe protein-protein recognition within the frame of the random energy model of statistical physics. We simulate, by docking the component proteins, the process of association of two proteins that form a complex. We obtain the energy spectrum of a set of protein-protein complexes of known three-dimensional structure by performing docking in random orientations and scoring the models thus generated. We use a coarse protein representation where each amino acid residue is replaced by its Vorono? cell, and derive a scoring function by applying the evolutionary learning program ROGER to a set of parameters measured on that representation. Taking the scores of the docking models to be interaction energies, we obtain energy spectra for the complexes and fit them to a Gaussian distribution, from which we derive physical parameters such as a glass transition temperature and a specificity transition temperature.  相似文献   

16.
A molecular envelope of the beta-mannosidase from Trichoderma reesei has been obtained by combined use of solution small-angle X-ray scattering (SAXS) and protein crystallography. Crystallographic data at 4 A resolution have been used to enhance informational content of the SAXS data and to obtain an independent, more detailed protein shape. The phased molecular replacement technique using a low resolution SAXS model, building, and refinement of a free atom model has been employed successfully. The SAXS and crystallographic free atom models exhibit a similar globular form and were used to assess available crystallographic models of glycosyl hydrolases. The structure of the beta-galactosidase, a member of a family 2, clan GHA glycosyl hydrolases, shows an excellent fit to the experimental molecular envelope and distance distribution function of the beta-mannosidase, indicating gross similarities in their three-dimensional structures. The secondary structure of beta-mannosidase quantified by circular dichroism measurements is in a good agreement with that of beta-galactosidase. We show that a comparison of distance distribution functions in combination with 1D and 2D sequence alignment techniques was able to restrict the number of possible structurally homologous proteins. The method could be applied as a general method in structural genomics and related fields once protein solution scattering data are available.  相似文献   

17.
How does a folding protein negotiate a vast, featureless conformational landscape and adopt its native structure in biological real time? Motivated by this search problem, we developed a novel algorithm to compare protein structures. Procedures to identify structural analogs are typically conducted in three-dimensional space: the tertiary structure of a target protein is matched against each candidate in a database of structures, and goodness of fit is evaluated by a distance-based measure, such as the root-mean-square distance between target and candidate. This is an expensive approach because three-dimensional space is complex. Here, we transform the problem into a simpler one-dimensional procedure. Specifically, we identify and label the 11 most populated residue basins in a database of high-resolution protein structures. Using this 11-letter alphabet, any protein''s three-dimensional structure can be transformed into a one-dimensional string by mapping each residue onto its corresponding basin. Similarity between the resultant basin strings can then be evaluated by conventional sequence-based comparison. The disorder → order folding transition is abridged on both sides. At the onset, folding conditions necessitate formation of hydrogen-bonded scaffold elements on which proteins are assembled, severely restricting the magnitude of accessible conformational space. Near the end, chain topology is established prior to emergence of the close-packed native state. At this latter stage of folding, the chain remains molten, and residues populate natural basins that are approximated by the 11 basins derived here. In essence, our algorithm reduces the protein-folding search problem to mapping the amino acid sequence onto a restricted basin string.  相似文献   

18.
Theories of protein folding often consider contributions from three fundamental elements: loops, hydrophobic interactions, and secondary structures. The pathway of protein folding, the rate of folding, and the final folded structure should be predictable if the energetic contributions to folding of these fundamental factors were properly understood. alphatalpha is a helix-turn-helix peptide that was developed by de novo design to provide a model system for the study of these important elements of protein folding. Hydrogen exchange experiments were performed on selectively 15N-labeled alphatalpha and used to calculate the stability of hydrogen bonds within the peptide. The resulting pattern of hydrogen bond stability was analyzed using a version of Lifson-Roig model that was extended to include a statistical parameter for tertiary interactions. This parameter, x, represents the additional statistical weight conferred upon a helical state by a tertiary contact. The hydrogen exchange data is most closely fit by the XHC model with an x parameter of 9.25. Thus the statistical weight of a hydrophobic tertiary contact is approximately 5.8x the statistical weight for helix formation by alanine. The value for the x parameter derived from this study should provide a basis for the understanding of the relationship between hydrophobic cluster formation and secondary structure formation during the early stages of protein folding.  相似文献   

19.
S Pervaiz  K Brew 《FASEB journal》1987,1(3):209-214
Although the serum protein alpha 1-acid glycoprotein (AGP) or orosomucoid has been extensively studied, its relationships with other proteins have been controversial and its precise physiological function has remained unclear. It is shown here that AGP is significantly similar in amino acid sequence and in the locations of introns in its structural gene to members of a protein superfamily that includes serum retinol-binding protein (RBP), beta-lactoglobulin (LG), alpha 2u-globulin, and protein HC (alpha 1-microglobulin). The view that the three-dimensional structure of AGP is closely similar to the published structures of RBP and LG is supported by its homology with these proteins, similarities in disulfide bond arrangements, and its secondary structure profile, predicted from the amino acid sequence. The relationship of AGP with this particular protein family indicates that its well-characterized ability to bind lipophilic drugs and certain steroids is a reflection of its true biological role. It is proposed that AGP and the other members of this extensive group of proteins should be designated lipocalins to reflect a common ability to bind lipophiles by enclosure within their structures in a manner that minimizes solvent contact.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号