首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Abstract

Conserved protein sequence segments are commonly believed to correspond to functional sites in the protein sequence. A novel approach is proposed to profile the changing degree of conservation along the protein sequence, by evaluating the occurrence frequencies of all short oligopeptides of the given sequence in a large proteome database. Thus, a protein sequence conservation profile can be plotted for every protein. The profile indicates where along the sequences the potential functional (conserved) sites are located. The corresponding oligopeptides belonging to the sites are very frequent across many prokaryotic species. Analysis of a representative set of such profiles reveals a common feature of all examined proteins: they consist of sequence modules represented by the peaks of conservation. Typical size of the modules (peak-to-peak distance) is 25–30 amino acid residues.  相似文献   

2.
The prediction of functional sites in newly solved protein structures is a challenge for computational structural biology. Most methods for approaching this problem use evolutionary conservation as the primary indicator of the location of functional sites. However, sequence conservation reflects not only evolutionary selection at functional sites to maintain protein function, but also selection throughout the protein to maintain the stability of the folded state. To disentangle sequence conservation due to protein functional constraints from sequence conservation due to protein structural constraints, we use all atom computational protein design methodology to predict sequence profiles expected under solely structural constraints, and to compute the free energy difference between the naturally occurring amino acid and the lowest free energy amino acid at each position. We show that functional sites are more likely than non-functional sites to have computed sequence profiles which differ significantly from the naturally occurring sequence profiles and to have residues with sub-optimal free energies, and that incorporation of these two measures improves sequence based prediction of protein functional sites. The combined sequence and structure based functional site prediction method has been implemented in a publicly available web server.  相似文献   

3.
From protein sequence space to elementary protein modules   总被引:2,自引:0,他引:2  
Frenkel ZM  Trifonov EN 《Gene》2008,408(1-2):64-71
The formatted protein sequence space is built from identical size fragments of prokaryotic proteins (112 complete proteomes). Connecting sequence-wise similar fragments (points in the space) results in the formation of numerous networks, that combine sometimes different types of proteins sharing, though, fragments with similar or distantly related sequences. The networks are mapped on individual protein sequences revealing distinct regions (modules) associated with prominent networks with well-defined functional identities. Presence of multiple sites of sequence conservation (modules) in a given protein sequence suggests that the annotated protein function may be decomposed in "elementary" subfunctions of the respective modules. The modules correspond to previously discovered conserved closed loop structures and their sequence prototypes.  相似文献   

4.
The conservation profile of a protein is a curve of the conservation levels of amino acids along the sequence. Biologists are usually more interested in individual points on the curve (namely, the conserved amino acids) than the overall shape of the curve. Here, we show that the conservation curves of proteins bear the imprints of molecules that are evolutionarily coupled to the proteins. Our method is based on recent studies that a sequence conservation profile is quantitatively linked to its structural packing profile. We find that the conservation profiles of nucleic acid (NA) binding proteins are better correlated with the packing profiles of the protein–NA complexes than those of the proteins alone. This indicates that a nucleic acid binding protein evolves to accommodate the nucleic acid in such a way that the residues involved in binding have their conservation levels closely coupled with the specific nucleotides. Proteins 2015; 83:1407–1413. © 2015 Wiley Periodicals, Inc.  相似文献   

5.
6.
We have recently showed that the weighted contact number profiles (or the packing density profiles) of proteins are well correlated with those of the corresponding sequence conservation profiles. The results suggest that a protein structure may contain sufficient information about sequence conservation comparable to that derived from multiple homologous sequences. However, there are ambiguities concerning how to compute the packing density of the subunit of a protein complex. For the subunits of a complex, there are different ways to compute its packing density – one including the packing contributions of the other subunits and the other one excluding their contributions. Here we selected two sets of enzyme complexes. Set A contains complexes with the active sites comprising residues from multiple subunits, while set B contains those with the active sites residing on single subunits. In Set A, if the packing density profile of a subunit is computed considering the contributions of the other subunits of the complex, it will agree better with the sequence conservation profile. But in Set B the situations are reversed. The results may be due to the stronger functional and structural constraints on the evolution processes on the complexes of Set A than those of Set B to maintain the enzymatic functions of the complexes. The comparison of the packing density and the sequence conservation profiles may provide a simple yet potentially useful way to understanding the structural and evolutionary couplings between the subunits of protein complexes. Proteins 2013; 81:1192–1199. © 2013 Wiley Periodicals, Inc.  相似文献   

7.
8.
Vascular endothelial growth factor (VEGF) is a potent angiogenic factor whose mRNA expression is induced by hypoxia. This induction is due in large part to an increase in the stability of its mRNA. The RNA sequences and cognate proteins responsible for this increased stability with hypoxia are not well understood. In order to identify regions of functional importance in the 3′UTR of VEGF mRNA, we have sequenced the human VEGF 3′UTR and compared it to the rat sequence. Overall sequence homology was 82% with complete conservation of all four potential polyadenylation signals and both nonameric instability elements. Five hypoxia-inducible RNA protein-binding (HI-RPB) sites were identified by RNA electromobility shift assay (EMSA) in the human and rat genes. EMSA and competition studies suggest that these sites bind a similar or related protein complex. On average, the five sites were 95% conserved at the nucleotide level between the rat and corresponding human sequence. This conservation taken together with several previously described, independent correlations between the presence of these RNA-protein complexes and an increase in VEGF mRNA stability suggest an important functional role for these sites in mediating hypoxia-inducible VEGF mRNA stability.  相似文献   

9.
Universal scale of the sequence conservation has been recently introduced based on omnipresence of the protein sequence motifs across species. A large spectrum of short sequences, up to eight residues has been found to reside in all or almost all prokaryotic organisms. By this discovery a principally novel quantitative approach is introduced to the problem of reconstruction of the last universal common ancestor (LUCA). The most conserved elements (protein modules) with defined structures and sequences harboring the omnipresent motifs are outlined in this work, by combining the sequence and protein crystal structure data. The structurally conserved modules involve 25–30 amino acid residues and have appearance of closed loops, loop-n-lock structures. This confirms earlier conclusions on the loop-fold structure of globular proteins. Many of the topmost conserved modules represent the primary closed loop prototypes, that have been derived by whole genome sequence searches. The data presented, thus, make a basis for further developments toward the earliest stages of protein evolution. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

10.
An artificial cell adhesive protein could be engineered by grafting the RGDS tetrapeptide, the core sequence of the major cell adhesive site of fibronectin, to a truncated form of Staphylococcal protein A (tSPA) via cassette mutagenesis of the tSPA expression vector pRIT2T [T. Maeda et al. (1989) J. Biol. Chem. 264, 15165-15168]. We synthesized a panel of tSPA derivatives grafted with various RGDS-containing oligopeptides to address the problem of how the cell adhesive activity of the resulting tSPA derivatives was affected by the length and amino acid sequence of the grafted oligopeptides and by the sites on tSPA where the extra oligopeptides were inserted. The results showed that (i) the amino acid residues flanking the RGDS core sequence played a key role in modulating the cell adhesive activity of the grafted RGDS signal; (ii) at least two sites on tSPA, each corresponding to on e of the two HindIII sites of pRIT2T, were competent in sustaining the cell adhesive activity of the grafted signal; and (iii) the divalent tSPA containing the RGDS signal at both sites was more active than monovalent derivatives containing only one signal at either site. These results provide a strategic basis for engineering of artificial cell adhesive proteins by grafting the RGDS signal.  相似文献   

11.
MOTIVATION: Due to the growing number of completely sequenced genomes, functional annotation of proteins becomes a more and more important issue. Here, we describe a method for the prediction of sites within protein domains, which are part of protein-ligand interactions. As recently demonstrated, these sites are not trivial to detect because of a varying degree of conservation of their location and type within a domain family. RESULTS: The developed method for the prediction of protein-ligand interaction sites is based on a newly defined interaction profile hidden Markov model (ipHMM) topology that takes structural and sequence data into account. It is based on a homology search via a posterior decoding algorithm that yields probabilities for interacting sequence positions and inherits the efficiency and the power of the profile hidden Markov model (pHMM) methodology. The algorithm enhances the quality of interaction site predictions and is a suitable tool for large scale studies, which was already demonstrated for pHMMs. AVAILABILITY: The MATLAB-files are available on request from the first author.  相似文献   

12.
IgGs from patients with multiple sclerosis and systemic lupus erythematosus (SLE) purified on MBP-Sepharose in contrast to canonical proteases hydrolyze effectively only myelin basic protein (MBP), but not many other tested proteins. Here we have shown for the first time that anti-MBP SLE IgGs hydrolyze nonspecific tri- and tetrapeptides with an extreme low efficiency and cannot effectively hydrolyze longer 20-mer nonspecific oligopeptides corresponding to antigenic determinants (AGDs) of HIV-1 integrase. At the same time, anti-MBP SLE IgGs efficiently hydrolyze oligopeptides corresponding to AGDs of MBP. All sites of IgG-mediated proteolysis of 21-and 25-mer encephalytogenic oligopeptides corresponding to two known AGDs of MBP were found by a combination of reverse-phase chromatography, TLC, and MALDI spectrometry. Several clustered major, moderate, and minor sites of cleavage were revealed in the case of 21- and 25-mer oligopeptides. The active sites of anti-MBP abzymes are localised on their light chains, while heavy chains are responsible for the affinity of protein substrates. Interactions of intact globular proteins with both light and heavy chains of abzymes provide high affinity to MBP and specificity of this protein hydrolysis. The affinity of anti-MBP abzymes for intact MBP is approximately 1000-fold higher than for the oligopeptides. The data suggest that all oligopeptides interact mainly with the light chains of different monoclonal abzymes of total pool of IgGs, which possesses a lower affinity for substrates, and therefore, depending on the oligopeptide sequences, their hydrolysis may be less specific than globular protein and can occur in several sites.  相似文献   

13.
Understanding and characterizing the biochemical and evolutionary information within the wealth of protein sequence and structural data, particularly at functionally important sites, is very important. A comprehensive analysis of physico-chemical properties and evolutionary conservation patterns at the molecular and biological function level is expected to yield important clues for identifying similar sites in as-yet uncharacterized proteins. We present a library of protein functional templates (PFTs) designed to represent the compositional and evolutionary conservation patterns of functional sites at the molecular and biological function level. Subsequently we developed LIMACS (LInear MAtching of Conservation Scores), a software tool that uses the template library for the prediction of functionally important sites in a multiple sequence alignment, transferring the molecular function annotation from the most-similar functional site in the template library to a predicted site.  相似文献   

14.
Domains are the building blocks of proteins and play a crucial role in protein-protein interactions. Here, we propose a new approach for the analysis and prediction of domain-domain interfaces. Our method, which relies on the representation of domains as residue-interacting networks, finds an optimal decomposition of domain structures into modules. The resulting modules comprise highly cooperative residues, which exhibit few connections with other modules. We found that non-overlapping binding sites in a domain, involved in different domain-domain interactions, are generally contained in different modules. This observation indicates that our modular decomposition is able to separate protein domains into regions with specialized functions. Our results show that modules with high modularity values identify binding site regions, demonstrating the predictive character of modularity. Furthermore, the combination of modularity with other characteristics, such as sequence conservation or surface patches, was found to improve our predictions. In an attempt to give a physical interpretation to the modular architecture of domains, we analyzed in detail six examples of protein domains with available experimental binding data. The modular configuration of the TEM1-beta-lactamase binding site illustrates the energetic independence of hotspots located in different modules and the cooperativity of those sited within the same modules. The energetic and structural cooperativity between intramodular residues is also clearly shown in the example of the chymotrypsin inhibitor, where non-binding site residues have a synergistic effect on binding. Interestingly, the binding site of the T cell receptor beta chain variable domain 2.1 is contained in one module, which includes structurally distant hot regions displaying positive cooperativity. These findings support the idea that modules possess certain functional and energetic independence. A modular organization of binding sites confers robustness and flexibility to the performance of the functional activity, and facilitates the evolution of protein interactions.  相似文献   

15.
Castle JC 《PloS one》2011,6(6):e20660
Rates of SNPs (single nucleotide polymorphisms) and cross-species genomic sequence conservation reflect intra- and inter-species variation, respectively. Here, I report SNP rates and genomic sequence conservation adjacent to mRNA processing regions and show that, as expected, more SNPs occur in less conserved regions and that functional regions have fewer SNPs. Results are confirmed using both mouse and human data. Regions include protein start codons, 3' splice sites, 5' splice sites, protein stop codons, predicted miRNA binding sites, and polyadenylation sites. Throughout, SNP rates are lower and conservation is higher at regulatory sites. Within coding regions, SNP rates are highest and conservation is lowest at codon position three and the fewest SNPs are found at codon position two, reflecting codon degeneracy for amino acid encoding. Exon splice sites show high conservation and very low SNP rates, reflecting both splicing signals and protein coding. Relaxed constraint on the codon third position is dramatically seen when separating exonic SNP rates based on intron phase. At polyadenylation sites, a peak of conservation and low SNP rate occurs from 30 to 17 nt preceding the site. This region is highly enriched for the sequence AAUAAA, reflecting the location of the conserved polyA signal. miRNA 3' UTR target sites are predicted incorporating interspecies genomic sequence conservation; SNP rates are low in these sites, again showing fewer SNPs in conserved regions. Together, these results confirm that SNPs, reflecting recent genetic variation, occur more frequently in regions with less evolutionarily conservation.  相似文献   

16.
17.
Twenty-seven protein sequence elements, six to nine amino acids long, were extracted from 15 phylogenetically diverse complete prokaryotic proteomes. The elements are present in all of these proteomes, with at least one copy each (omnipresent elements), and have presumably been conserved since the last universal common ancestor (LUCA). All these omnipresent elements are identified in crystallized protein structures as parts of highly conserved closed loops, 25–30 residues long, thus representing the closed-loop modules discovered in 2000 by Berezovsky et al. The omnipresent peptides make up seven distinct groups, of which the largest groups, Aleph and Beth, contain 18 and four elements, respectively, which are related but different, while five other groups are represented by only one element each. The LUCA modules appear with one or several copies per protein molecule in a variety of combinations depending on the functional identity of the corresponding protein. The functional involvement of individual LUCA modules is outlined on the basis of known protein annotations. Analyses of all the related sequences in a large, formatted protein sequence space suggest that many, if not all, of the 27 omnipresent elements have a common sequence origin. This sequence space network analysis may lead to elucidation of the earliest stages of protein evolution.  相似文献   

18.
Domains are the building blocks of proteins and play a crucial role in protein–protein interactions. Here, we propose a new approach for the analysis and prediction of domain–domain interfaces. Our method, which relies on the representation of domains as residue-interacting networks, finds an optimal decomposition of domain structures into modules. The resulting modules comprise highly cooperative residues, which exhibit few connections with other modules. We found that non-overlapping binding sites in a domain, involved in different domain–domain interactions, are generally contained in different modules. This observation indicates that our modular decomposition is able to separate protein domains into regions with specialized functions. Our results show that modules with high modularity values identify binding site regions, demonstrating the predictive character of modularity. Furthermore, the combination of modularity with other characteristics, such as sequence conservation or surface patches, was found to improve our predictions. In an attempt to give a physical interpretation to the modular architecture of domains, we analyzed in detail six examples of protein domains with available experimental binding data. The modular configuration of the TEM1-β-lactamase binding site illustrates the energetic independence of hotspots located in different modules and the cooperativity of those sited within the same modules. The energetic and structural cooperativity between intramodular residues is also clearly shown in the example of the chymotrypsin inhibitor, where non–binding site residues have a synergistic effect on binding. Interestingly, the binding site of the T cell receptor β chain variable domain 2.1 is contained in one module, which includes structurally distant hot regions displaying positive cooperativity. These findings support the idea that modules possess certain functional and energetic independence. A modular organization of binding sites confers robustness and flexibility to the performance of the functional activity, and facilitates the evolution of protein interactions.  相似文献   

19.
A. A. Zamyatnin 《Biophysics》2008,53(5):329-335
The term fragmentomics is grounded and defined. Theoretical structure-function analysis of all possible fragments of a protein molecule was performed under the concept of fragmentomics to determine the regions that could be potential sources of regulatory oligopeptides. For this purpose, we used the data on the primary structure of bovine hemoglobin, the information contained in the EROP-Moscow database on the structures and functions of natural oligopeptides, and a specialized software package. This analysis revealed natural nonhemoglobin oligopeptides containing hemoglobin fragments and natural oligopeptides with the structure precisely coinciding with hemoglobin fragments. The most abundant of them are neuropeptides, antimicrobial oligopeptides, and hormones. It was demonstrated that the tetrapeptide and larger fragments of hemoglobin identified in nonhemoglobin oligopeptides and possessing a mentioned activity are present in the amino acid sequences of experimentally determined hemoglobin oligopeptides with the same function. The proposed approach allowed us to discover new potentially active sites in the hemoglobin amino acid sequence not yet studied experimentally. The possibility of natural formation of regulatory oligopeptides from hemoglobin molecules and other food proteins is discussed, as well as the generation of an exogenous oligopeptide pool in the gastrointestinal tract, and how the results match the concept of natural continuum of regulatory oligopeptides.  相似文献   

20.
A sensitive technique for protein sequence motif recognition based on neural networks has been developed. It involves three major steps. (1) At each appropriate alignment position of a set of N matched sequences, a set of N aligned oligopeptides is specified with preselected window length. N neural nets are subsequently and successively trained on N-1 amino acid spans after eliminating each ith oligopeptide. A test for recognition of each of the ith spans is performed. The average neural net recognition over N such trials is used as a measure of conservation for the particular windowed region of the multiple alignment. This process is repeated for all possible spans of given length in the multiple alignment. (2) The M most conserved regions are regarded as motifs and the oligopeptides within each are used to train intensively M individual neural networks. (3) The M networks are then applied in a search for related primary structures in a databank of known protein sequences. The oligopeptide spans in the database sequence with strongest neural net output for each of the M networks are saved and then scored according to the output signals and the proper combination that follows the expected N- to C-terminal sequence order. The motifs from the database with highest similarity scores can then be used to retrain the M neural nets, which can be subsequently utilized for further searches in the databank, thus providing even greater sensitivity to recognize distant familial proteins. This technique was successfully applied to the integrase, DNA-polymerase and immunoglobulin families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号