首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Antibodies that bind to protein surfaces of interest can be used to report the three-dimensional structure of the protein as follows: Proteins are composed of linear polypeptide chains that fold together in complex spatial patterns to create the native protein structure. These folded structures form binding sites for antibodies. Antibody binding sites are typically "assembled" on the protein surface from segments that are far apart in the primary amino acid sequence of the target proteins. Short amino acid probe sequences that bind to the active region of each antibody can be used as witnesses to the antibody epitope surface and these probes can be efficiently selected from random sequence peptide libraries. This paper presents a new method to align these antibody epitopes to discontinuous regions of the one-dimensional amino acid sequence of a target protein. Such alignments of the epitopes indicate how segments of the protein sequence must be folded together in space and thus provide long-range constraints for solving the 3-D protein structure. This new antibody-based approach is applicable to the large fraction of proteins that are refractory to current approaches for structure determination and has the additional advantage of requiring very small amounts of the target protein. The binding site of an antibody is a surface, not just a continuous linear sequence, so the epitope mapping alignment problem is outside the scope of classical string alignment algorithms, such as Smith-Waterman. We formalize the alignment problem that is at the heart of this new approach, prove that the epitope mapping alignment problem is NP-complete, and give some initial results using a branch-and-bound algorithm to map two real-life cases. Initial results for two validation cases are presented for a graph-based protein surface neighbor mapping procedure that promises to provide additional spatial proximity information for the amino acid residues on the protein surface.  相似文献   

2.
Discovery of local packing motifs in protein structures   总被引:1,自引:0,他引:1  
We present a language for describing structural patterns of residues in protein structures and a method for the discovery of such patterns that recur in a set of protein structures. The patterns impose restrictions on the spatial position of each residue, their order along the amino acid chain, and which amino acids are allowed in each position. Unlike other methods for comparing sets of protein structures, our method is not based on the use of pairwise structure comparisons which is often time consuming and can produce inconsistent results. Instead, the method simultaneously takes into account information from all structures in the search for conserved structure patterns which are potential structure motifs. The method is based on describing the spatial neighborhoods of each residue in each structure as a string and applying a sequence pattern discovery method to find patterns common to subsets of these strings. Finally it is checked whether the similarities between the neighborhood strings correspond to spatially similar substructures. We apply the method to analyze sets of very disparate proteins from the four different protein families: serine proteases, cuprodoxins, cysteine proteinases, and ferredoxins. The motifs found by the method correspond well to the site and motif information given in the annotation of these proteins in PDB, Swiss-Prot, and PROSITE. Furthermore, the motifs are confirmed by using the motif data to constrain the structural alignment of the proteins obtained with the program SAP. This gave the best superposition/alignment of the proteins given the motif assignment.  相似文献   

3.
4.
The structural stability of a protein requires a large number of interresidue interactions. The energetic contribution of these can be approximated by low-resolution force fields extracted from known structures, based on observed amino acid pairing frequencies. The summation of such energies, however, cannot be carried out for proteins whose structure is not known or for intrinsically unstructured proteins. To overcome these limitations, we present a novel method for estimating the total pairwise interaction energy, based on a quadratic form in the amino acid composition of the protein. This approach is validated by the good correlation of the estimated and actual energies of proteins of known structure and by a clear separation of folded and disordered proteins in the energy space it defines. As the novel algorithm has not been trained on unstructured proteins, it substantiates the concept of protein disorder, i.e. that the inability to form a well-defined 3D structure is an intrinsic property of many proteins and protein domains. This property is encoded in their sequence, because their biased amino acid composition does not allow sufficient stabilizing interactions to form. By limiting the calculation to a predefined sequential neighborhood, the algorithm was turned into a position-specific scoring scheme that characterizes the tendency of a given amino acid to fall into an ordered or disordered region. This application we term IUPred and compare its performance with three generally accepted predictors, PONDR VL3H, DISOPRED2 and GlobPlot on a database of disordered proteins.  相似文献   

5.
Kaur H  Raghava GP 《FEBS letters》2004,564(1-2):47-57
In this study, an attempt has been made to develop a neural network-based method for predicting segments in proteins containing aromatic-backbone NH (Ar-NH) interactions using multiple sequence alignment. We have analyzed 3121 segments seven residues long containing Ar-NH interactions, extracted from 2298 non-redundant protein structures where no two proteins have more than 25% sequence identity. Two consecutive feed-forward neural networks with a single hidden layer have been trained with standard back-propagation as learning algorithm. The performance of the method improves from 0.12 to 0.15 in terms of Matthews correlation coefficient (MCC) value when evolutionary information (multiple alignment obtained from PSI-BLAST) is used as input instead of a single sequence. The performance of the method further improves from MCC 0.15 to 0.20 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields an overall prediction accuracy of 70.1% and an MCC of 0.20 when tested by five-fold cross-validation. Overall the performance is 15.2% higher than the random prediction. The method consists of two neural networks: (i) a sequence-to-structure network which predicts the aromatic residues involved in Ar-NH interaction from multiple alignment of protein sequences and (ii) a structure-to structure network where the input consists of the output obtained from the first network and predicted secondary structure. Further, the actual position of the donor residue within the 'potential' predicted fragment has been predicted using a separate sequence-to-structure neural network. Based on the present study, a server Ar_NHPred has been developed which predicts Ar-NH interaction in a given amino acid sequence. The web server Ar_NHPred is available at and (mirror site).  相似文献   

6.
The serum (storage) proteins produced by insect larvae at the end of the feeding cycle are hexameric blood proteins with one or more type of subunits. The cDNA and gene structure of the aromatic amino acid-rich larval serum protein arylphorin from the tobacco hornworm, Manduca sexta, has been determined. In M. sexta arylphorin there are two subunits alpha and beta, which have 686 and 687 amino acids, respectively, and whose amino acid sequences are 68% identical. The two genes, separated by 7.1 kilobases of chromosomal DNA, are transcribed in the same direction. Based on the alignment of the amino acid sequence, the rate of nucleotide substitution between the two coding regions predicts that the two genes diverged about 100 million years ago. Both genes contain 5 exons and the upstream region contains a sequence, TGATAAA, which is similar to a sequence found in all other storage protein genes for which information is available. When the National Biomedical Research Foundation protein sequence data base was searched, it was found that the arylphorin subunits showed significant similarity to the arthropod hemocyanins, which are hexameric oxygen-carrying proteins. Based on the alignment of the sequence of M. sexta arylphorin and the hemocyanin from the spiny lobster (Panulirus interruptus), for which a 3.2 A structure has been determined, it was observed that the highest concentration of conserved residues were found in those regions of the sequence which are involved in subunit interactions in the hexameric protein. It is suggested that the insect storage proteins and the arthropod hemocyanins have evolved from a common ancestor.  相似文献   

7.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

8.
Protein structure and function at low temperatures   总被引:2,自引:0,他引:2  
Proteins represent the major components in the living cell that provide the whole repertoire of constituents of cellular organization and metabolism. In the process of evolution, adaptation to extreme conditions mainly referred to temperature, pH and low water activity. With respect to life at low temperatures, effects on protein structure, protein stability and protein folding need consideration. The sequences and topologies of proteins from psychrophilic, mesophilic and thermophilic organisms are found to be highly homologous. Commonly, adaptive changes refer to multiple alterations of the amino acid sequence, which presently cannot be correlated with specific changes of structure and stability; so far it has not been possible to attribute specific increments in the free energy of stabilization to well-defined amino-acid exchanges in an unambiguous way. The stability of proteins is limited at high and low temperatures. Their expression and self-organization may be accomplished under conditions strongly deviating from optimum growth conditions. Molecular adaptation to extremes of temperature seems to be accompanied by a flattening of the temperature profile of the free energy of stabilization. In principle, the free energy of stabilization of proteins is small compared to the total molecular energy. As a consequence, molecular adaptation to extremes of physical conditions only requires marginal alterations of the intermolecular interactions and packing density. Careful statistical and structural analyses indicate that altering the number of ion pairs and hydrophobic interactions allows the flexibility of proteins to be adjusted so that full catalytic function is maintained at varying temperatures.  相似文献   

9.
Thompson J  Baker D 《Proteins》2011,79(8):2380-2388
Prediction of protein structures from sequences is a fundamental problem in computational biology. Algorithms that attempt to predict a structure from sequence primarily use two sources of information. The first source is physical in nature: proteins fold into their lowest energy state. Given an energy function that describes the interactions governing folding, a method for constructing models of protein structures, and the amino acid sequence of a protein of interest, the structure prediction problem becomes a search for the lowest energy structure. Evolution provides an orthogonal source of information: proteins of similar sequences have similar structure, and therefore proteins of known structure can guide modeling. The relatively successful Rosetta approach takes advantage of the first, but not the second source of information during model optimization. Following the classic work by Andrej Sali and colleagues, we develop a probabilistic approach to derive spatial restraints from proteins of known structure using advances in alignment technology and the growth in the number of structures in the Protein Data Bank. These restraints define a region of conformational space that is high-probability, given the template information, and we incorporate them into Rosetta's comparative modeling protocol. The combined approach performs considerably better on a benchmark based on previous CASP experiments. Incorporating evolutionary information into Rosetta is analogous to incorporating sparse experimental data: in both cases, the additional information eliminates large regions of conformational space and increases the probability that energy-based refinement will hone in on the deep energy minimum at the native state.  相似文献   

10.
Vertebrate fibrinogen is a complex multidomained protein, the structure of which has been inferred mainly from electron microscopy and amino acid sequence studies. Among its most prominent features are two terminal globules, moieties that are mostly composed of the carboxyl-terminal two-thirds of the beta and gamma chains. Sequences homologous to the latter segments are found in several other animal proteins, always as the carboxyl-terminal contributions. An alignment of 15 amino acid sequences from various fibrinogens and related proteins has been used to make judgments about secondary structure. The nature of amino acids at each position in the alignment was used to distinguish alpha helices and beta structure on the one hand from loops and turns on the other, and the resulting assignments compared with predictions of secondary structure by other methods. Additionally, constraints imposed by the locations of cystines, carbohydrate attachment residues, and proteinase-sensitive points provided further insights into the general organization of the postulated secondary structures. Other ancillary data, including the effects of bound calcium and the locations of labeled or variant residues, were also considered. An intriguing similarity to a portion of the recently reported structure of a calcium-dependent lectin is noted.  相似文献   

11.
MOTIVATION: We propose a general method for deriving amino acid substitution matrices from low resolution force fields. Unlike current popular methods, the approach does not rely on evolutionary arguments or alignment of sequences or structures. Instead, residues are computationally mutated and their contribution to the total energy/score is collected. The average of these values over each position within a set of proteins results in a substitution matrix. RESULTS: Example substitution matrices have been calculated from force fields based on different philosophies and their performance compared with conventional substitution matrices. Although this can produce useful substitution matrices, the methodology highlights the virtues, deficiencies and biases of the source force fields. It also allows a rather direct comparison of sequence alignment methods with the score functions underlying protein sequence to structure threading. AVAILABILITY: Example substitution matrices are available from http://www.rsc.anu.edu.au/~zsuzsa/suppl/matrices.html. SUPPLEMENTARY INFORMATION: The list of proteins used for data collection and the optimized parameters for the alignment are given as supplementary material at http://www.rsc.anu.edu.au/~zsuzsa/suppl/matrices.html.  相似文献   

12.
Ge Y  Wu J  Xiao J  Yu J 《Journal of molecular modeling》2011,17(12):3183-3193
The α/β-type small acid soluble proteins (SASPs) are a major factor in protecting the spores from being killed in bacteria. In this article, we perform a systematic phylogenetic analysis of the α/β-type SASP in the genus of Geobacillus, which indicates that the whole family can be divided into three groups. We choose one protein from each group as a representative and construct the tertiary structure of these proteins. In order to explore the mechanism of protecting DNA from damage, 15 ns molecular dynamics simulation for the four complexes of Gsy3 with DNA are performed. The sequence alignment, model structure and binding energy analysis indicate that the helix2 region of SASPs is more conserved and plays a more crucial role in protecting DNA. Pairwise decomposition of residue interaction energies calculation demonstrate that amino acids of Asn10, Lys24, Asn49, Ile52, Ile56, Thr57, Lys58, Arg59 and Val61 take major effect in the binding interaction. The differences of energy contribution of the amino acids between different complexes make us conclude that the protein structure conformation has a slight change upon more proteins binding to DNA and consequently there occur protein-protein cooperation interactions.  相似文献   

13.
The information required to generate a protein structure is contained in its amino acid sequence, but how three-dimensional information is mapped onto a linear sequence is still incompletely understood. Multiple structure alignments of similar protein structures have been used to investigate conserved sequence features but contradictory results have been obtained, due, in large part, to the absence of subjective criteria to be used in the construction of sequence profiles and in the quantitative comparison of alignment results. Here, we report a new procedure for multiple structure alignment and use it to construct structure-based sequence profiles for similar proteins. The definition of "similar" is based on the structural alignment procedure and on the protein structural distance (PSD) described in paper I of this series, which offers an objective measure for protein structure relationships. Our approach is tested in two well-studied groups of proteins; serine proteases and Ig-like proteins. It is demonstrated that the quality of a sequence profile generated by a multiple structure alignment is quite sensitive to the PSD used as a threshold for the inclusion of proteins in the alignment. Specifically, if the proteins included in the aligned set are too distant in structure from one another, there will be a dilution of information and patterns that are relevant to a subset of the proteins are likely to be lost.In order to understand better how the same three-dimensional information can be encoded in seemingly unrelated sequences, structure-based sequence profiles are constructed for subsets of proteins belonging to nine superfolds. We identify patterns of relatively conserved residues in each subset of proteins. It is demonstrated that the most conserved residues are generally located in the regions where tertiary interactions occur and that are relatively conserved in structure. Nevertheless, the conservation patterns are relatively weak in all cases studied, indicating that structure-determining factors that do not require a particular sequential arrangement of amino acids, such as secondary structure propensities and hydrophobic interactions, are important in encoding protein fold information. In general, we find that similar structures can fold without having a set of highly conserved residue clusters or a well-conserved sequence profile; indeed, in some cases there is no apparent conservation pattern common to structures with the same fold. Thus, when a group of proteins exhibits a common and well-defined sequence pattern, it is more likely that these sequences have a close evolutionary relationship rather than the similarities having arisen from the structural requirements of a given fold.  相似文献   

14.
Extended proteins such as calmodulin and troponin C have two globular terminal domains linked by a central region that is exposed to water and often acts as a function-regulating element. The mechanisms that stabilize the tertiary structure of extended proteins appear to differ greatly from those of globular proteins. Identifying such differences in physical properties of amino acid sequences between extended proteins and globular proteins can provide clues useful for identification of extended proteins from complete genomes including orphan sequences. In the present study, we examined the structure and amino acid sequence of extended proteins. We found that extended proteins have a large net electric charge, high charge density, and an even balance of charge between the terminal domains, indicating that electrostatic interaction is a dominant factor in stabilization of extended proteins. Additionally, the central domain exposed to water contained many amphiphilic residues. Extended proteins can be identified from these physical properties of the tertiary structure, which can be deduced from the amino acid sequence. Analysis of physical properties of amino acid sequences can provide clues to the mechanism of protein folding. Also, structural changes in extended proteins may be caused by formation of molecular complexes. Long-range effects of electrostatic interactions also appear to play important roles in structural changes of extended proteins.  相似文献   

15.
Prediction of the location of structural domains in globular proteins   总被引:7,自引:0,他引:7  
The location of structural domains in proteins is predicted from the amino acid sequence, based on the analysis of a computed contact map for the protein, the average distance map (ADM). Interactions between residues i and j in a protein are subdivided into several ranges, according to the separation |i-j| in the amino acid sequence. Within each range, average spatial distances between every pair of amino acid residues are computed from a data base of known protein structures. Infrequently occurring pairs are omitted as being statistically insignificant. The average distances are used to construct a predicted ADM. The ADM is analyzed for the occurrence of regions with high densities of contacts (compact regions). Locations of rapid changes of density between various parts of the map are determined by the use of scanning plots of contact densities. These locations serve to pinpoint the distribution of compact regions. This distribution, in turn, is used to predict boundaries of domains in the protein. The technique provides an objective method for the location of domains both on a contact map derived from a known three-dimensional protein structure, the real distance map (RDM), and on an ADM. While most other published methods for the identification of domains locate them in the known three-dimensional structure of a protein, the technique presented here also permits the prediction of domains in proteins of unknown spatial structure, as the construction of the ADM for a given protein requires knowledge of only its amino acid sequence.  相似文献   

16.
The N-terminal half of the alpha-domain (residues 1 to 34) is more important for the stability of the acid-induced molten globule state of alpha-lactalbumin than the C-terminal half (residues 86 to 123). The refolding and unfolding kinetics of a chimera, in which the amino acid sequence of residues 1 to 34 was from human alpha-lactalbumin and the remainder of the sequence from bovine alpha-lactalbumin, were studied by stopped-flow tryptophan fluorescence spectroscopy. The chimeric protein refolded and unfolded substantially faster than bovine alpha-lactalbumin. The stability of the molten globule state formed by the chimera was greater than that of bovine alpha-lactalbumin, and the hydrophobic surface area buried inside of the molecule in the molten globule state was increased by the substitution of residues 1 to 34. Peptide fragments corresponding to the A- and B-helix of the chimera showed higher helix propensity than those of the bovine protein, indicating the contribution of local interactions to the high stability of the molten globule state of the chimera. Moreover, the substitution of residues 1-34 decreased the free energy level of the transition state and increased hydrophobic surface area buried inside of the molecule in the transition state. Our results indicate that local interactions as well as hydrophobic interactions formed in the molten globule state are important in guiding the subsequent structural formation of alpha-lactalbumin.  相似文献   

17.
Lack of crystal structure data of folate binding proteins has left so many questions unanswered (for example, important residues in active site, binding domain, important amino acid residues involved in interactions between ligand and receptor). With sequence alignment and PROSITE motif identification, we attempted to answer evolutionarily significant residues that are of functional importance for ligand binding and that form catalytic sites. We have analyzed 46 different FRs and FBP sequences of various organisms obtained from Genbank. Multiple sequence alignment identified 44 highly conserved identical amino acid residues with 10 cysteine residues and 12 motifs including ECSPNLGPW (which might help in the structural stability of FR).  相似文献   

18.
The molten globule state of alpha-lactalbumin has ordered secondary structure in the alpha-domain, which comprises residues 1 to 34 and 86 to 123. In order to investigate which part of a polypeptide is important for stabilizing the molten globule state of alpha-lactalbumin, we have produced and studied three chimeric proteins of bovine and human alpha-lactalbumin. The stability of the molten globule state formed by domain-exchanged alpha-lactalbumin, in which the amino acid sequence in the alpha-domain comes from human alpha-lactalbumin and that in the beta-domain comes from bovine alpha-lactalbumin, is the same as that of human alpha-lactalbumin and is substantially greater than that of bovine alpha-lactalbumin. Therefore, our results show that the stability of the molten globule state of alpha-lactalbumin is determined by the alpha-domain and the beta-domain is not important for stabilizing the molten globule state. The substitution of residues 1 to 34 of bovine alpha-lactalbumin with those of human alpha-lactalbumin substantially increases the stability of the molten globule state, while the substitution of residues 86 to 123 of bovine alpha-lactalbumin with those of human alpha-lactalbumin decreases the stability of the molten globule state. Therefore, residues 1 to 34 in human alpha-lactalbumin is more important for the stability of the human alpha-lactalbumin molten globule state than residues 86 to 123. The stabilization of the molten globule state due to substitution of both residues 1 to 34 and 86 to 123 is not identical with the sum of the two individual substitutions, demonstrating the non-additivity of the stabilization of the molten globule state. This result indicates that there is a long-range interaction between residues 1 to 34 and 86 to 123 in the molten globule state of human alpha-lactalbumin. The differences in the stabilities of the molten globule states are well correlated with the averaged helical propensity values in the alpha-domain when the long-range interactions are negligible, suggesting that the local interaction is the dominant term for determining the stability of the molten globule state. Our results also indicate that the apparent cooperativity is closely linked to the stability of the molten globule state, even if the molten globule state is weakly cooperative.  相似文献   

19.
Wang J  Feng JA 《Proteins》2005,58(3):628-637
Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25% sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20% sequence identity. For sequence pairs sharing 13-21% sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6%. NdPASA is most effective for aligning query sequences with template sequences whose structure is known. NdPASA can be accessed online at http://astro.temple.edu/feng/Servers/BioinformaticServers.htm.  相似文献   

20.
As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However, the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletion, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as insertion/deletion (indel) frequency arrays (IFAs). By applying IFAs to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity. We have also demonstrated that the application of this information can lead to an improvement in fold recognition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号