首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Automatic methods for predicting functionally important residues   总被引:9,自引:0,他引:9  
Sequence analysis is often the first guide for the prediction of residues in a protein family that may have functional significance. A few methods have been proposed which use the division of protein families into subfamilies in the search for those positions that could have some functional significance for the whole family, but at the same time which exhibit the specificity of each subfamily ("Tree-determinant residues"). However, there are still many unsolved questions like the best division of a protein family into subfamilies, or the accurate detection of sequence variation patterns characteristic of different subfamilies. Here we present a systematic study in a significant number of protein families, testing the statistical meaning of the Tree-determinant residues predicted by three different methods that represent the range of available approaches. The first method takes as a starting point a phylogenetic representation of a protein family and, following the principle of Relative Entropy from Information Theory, automatically searches for the optimal division of the family into subfamilies. The second method looks for positions whose mutational behavior is reminiscent of the mutational behavior of the full-length proteins, by directly comparing the corresponding distance matrices. The third method is an automation of the analysis of distribution of sequences and amino acid positions in the corresponding multidimensional spaces using a vector-based principal component analysis. These three methods have been tested on two non-redundant lists of protein families: one composed by proteins that bind a variety of ligand groups, and the other composed by proteins with annotated functionally relevant sites. In most cases, the residues predicted by the three methods show a clear tendency to be close to bound ligands of biological relevance and to those amino acids described as participants in key aspects of protein function. These three automatic methods provide a wide range of possibilities for biologists to analyze their families of interest, in a similar way to the one presented here for the family of proteins related with ras-p21.  相似文献   

3.
Dihydrofolate reductase (DHFR) is of significant recent interest as a target for drugs against parasitic and opportunistic infections. Understanding factors which influence DHFR homolog inhibitor specificity is critical for the design of compounds that selectively target DHFRs from pathogenic organisms over the human homolog. This paper presents a novel approach for predicting residues involved in ligand discrimination in a protein family using DHFR as a model system. In this approach, the relationship between inhibitor specificity and amino acid composition for sets of protein homolog pairs is examined. Similar inhibitor specificity profiles correlate with increased sequence homology at specific alignment positions. Residue positions that exhibit the strongest correlations are predicted as specificity determinants. Correlation analysis requires a quantitative measure of similarity in inhibitor specificity (S(lig)) for a pair of homologs. To this end, a method of calculating S(lig) values using K(I) values for the two homologs against a set of inhibitors as input was developed. Correlation analysis of S(lig) values to amino acid sequence similarity scores - obtained via multiple sequence alignments - was performed for individual residue alignment positions and sets of residues on 13 DHFRs. Eighteen alignment positions were identified with a strong correlation of S(lig) to sequence similarity. Of these, three lie in the active site; four are located proximal to the active site, four are clustered together in the adenosine binding domain and five on the βFβG loop. The validity of the method is supported by agreement between experimental findings and current predictions involving active site residues.  相似文献   

4.
This study was initiated to gain further insight into the structural features of the mammalian fetuin family. The cDNA structures of sheep and pig fetuin were determined. The cDNA insert encoding sheep (pig) fetuin comprised 1550 (1470) nucleotides, including 54 (46) nucleotides encoding a signal peptide of 18 (15) residues and 1038 (1041) nucleotides encoding the 346 (347) amino acids of the mature plasma protein. The predicted amino-terminal sequence of the mature pig fetuin was confirmed by the amino-terminal sequence of the purified protein. However, two alternative sheep amino-terminal sequences were found in fetuin purified from the plasma of a single sheep fetus; the minor product was the one predicted by comparison with other fetuin sequences while the major product was two amino acids longer. Comparison of the deduced amino acid sequences of sheep and pig fetuin showed an extensive sequence identity between them (75%) and with other proteins of the mammalian fetuin family, i.e. human alpha 2-HS glycoprotein, and bovine and rat fetuins. Twelve cysteine residues were found at invariant positions in all fetuin sequences, suggesting strongly that the arrangement of disulphide bridges identified in human alpha 2-HS glycoprotein is common to the members of the family. Further sequence comparisons revealed that the structures of mammalian fetuins are organised in three domains: two cystatin-like domains (D1 and D2) and a complex carboxyl-terminal domain (D3). The proposed three-domain structure of the protein is reflected in the organisation of the rat fetuin structural gene which has recently been published.  相似文献   

5.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

6.
Multidrug efflux mechanism is the main cause of intrinsic drug resistance in bacteria. Mycobacterium multidrug resistant (MMR) protein belongs to small multidrug resistant family proteins (SMR), causing multidrug resistance to proton (H+)-linked lipophilic cationic drug efflux across the cell membrane. In the present work, MMR is treated as a novel target to identify new molecular entities as inhibitors for drug resistance in Mycobacterium tuberculosis. In silico techniques are applied to evaluate the 3D structure of MMR protein. The putative amino acid residues present in the active site of MMR protein are predicted. Protein–ligand interactions are studied by docking cationic ligands transported by MMR protein. Virtual screening is carried out with an in-house library of small molecules against the grid created at the predicted active site residues in the MMR protein. Absorption distribution metabolism and elimination (ADME) properties of the molecules with best docking scores are predicted. The studies with cationic ligands and those of virtual screening are analysed for identification of new lead molecules as inhibitors for drug resistance caused by the MMR protein.  相似文献   

7.
In this article, we use animal G-protein alpha subunit family as an example to illustrate a comprehensive analytical pipeline for detecting different types of functional divergence of protein families, which is phylogeny-dependent, combined with ancestral sequence inference and available protein structure information. In particular, we focus on (i) Type-I functional divergence, or site-specific rate shift, as typically exemplified by amino acid residue highly conserved in a subset of homologous genes but highly variable in a different subset of homologous genes, and (ii) Type-II functional divergence, or the shift of cluster-specific amino acid property, as exemplified by a radical shift of amino acid property between duplicate genes, which is otherwise evolutionally conserved. We utilized the software DIVERGE2 to carry out these analyses. In the case of G-protein alpha subunit gene family, we have predicted amino acid residues that are related to either Type-I or Type-II functional divergence. The inferred ancestral sequences for these sites are helpful to explore the trends of functional divergence. Finally, these predicted residues are mapped to the protein structures to test whether these residues may have 3D structure or solvent accessibility preference.  相似文献   

8.
The urea amidolyase (DUR1,2) gene of Saccharomyces cerevisiae.   总被引:5,自引:0,他引:5  
The DNA sequence of the urea amidolyase (DUR1,2) gene from S. cerevisiae has been determined. The polypeptide structure deduced from the DNA sequence contains 1,835 amino acid residues and possesses a calculated weight of 201,665 daltons which favorably correlates with that predicted from compositional analysis of purified protein (1,881 amino acid residues and a molecular weight of 203,900). The C-terminal 57 residues of the polypeptide exhibit significant homology with similarly situated sequences found in five other biotin carboxylases whose primary structures have been determined or deduced from protein and DNA sequence data, respectively. Major S1 nuclease protection fragments derived from DUR1,2 RNA-DNA hybrids exhibit apparent termini at positions -140 and -141 upstream of the coding region. The termini of minor protection fragments also occur at eleven other positions as well.  相似文献   

9.
10.
Proteases recognize specific substrate sequences and catalyze the hydrolysis of targeted peptide bonds to activate or degrade them. It is particularly important to identify the recognition and binding mechanisms of protease–substrate complex structures in studies of drug development. Cleavage specificity in protease systems is generally determined by the amino acid profile, structural features, and distinct molecular interactions. In this work, substrate variability and substrate specificity of the NS3/4A serine protease encoded by the hepatitis C virus (HCV) was investigated by the biased sequence search threading (BSST) methodology. The available crystal structures of peptide-bound protease were used as templates as well as new complex structures that were generated via docking calculations. Threading various binding and nonbinding sequences as starting sequences over multiple templates, the potential sequence space was efficiently explored by a low-resolution knowledge-based scoring potential. The low-energy substrate sequences generated by the biased search are correlated with the natural substrates with conserved amino acid preferences, although some positions exhibit variability. Specifically, the amino acids which play essential roles in cleavage are mostly preferred. Potential substrate sequences were predicted by statistical probability approaches that consider the pairwise and triplewise interdependencies among residue positions in the low-energy sequences. The predicted substrate sequences also reproduce most of the natural substrate sequences, implying the complex interdependence between the different substrate residues. Consequently, the BSST seems to provide a powerful methodology for predicting the substrate specificity for the NS3/4A protease, which is a target in drug discovery studies for HCV.  相似文献   

11.
Nagao C  Izako N  Soga S  Khan SH  Kawabata S  Shirai H  Mizuguchi K 《Proteins》2012,80(10):2426-2436
Proteins interact with different partners to perform different functions and it is important to elucidate the determinants of partner specificity in protein complex formation. Although methods for detecting specificity determining positions have been developed previously, direct experimental evidence for these amino acid residues is scarce, and the lack of information has prevented further computational studies. In this article, we constructed a dataset that is likely to exhibit specificity in protein complex formation, based on available crystal structures and several intuitive ideas about interaction profiles and functional subclasses. We then defined a “structure‐based specificity determining position (sbSDP)” as a set of equivalent residues in a protein family showing a large variation in their interaction energy with different partners. We investigated sequence and structural features of sbSDPs and demonstrated that their amino acid propensities significantly differed from those of other interacting residues and that the importance of many of these residues for determining specificity had been verified experimentally. Proteins 2012;. © 2012 Wiley Periodicals, Inc.  相似文献   

12.
Three isoinhibitors have been isolated to homogeneity from the C-serum of the latex of the rubber tree, Hevea brasiliensis clone RRIM 600, and named HPI-1, HPI-2a and HPI-2b. The three inhibitors share the same amino acid sequence (69 residues) but the masses of the three forms were determined to be 14,893+/-10, 7757+/-5, and 7565+/-5, respectively, indicating that post-translational modifications of the protein have occurred during latex collection. One adduct could be removed by reducing agents, and was determined to be glutathione, while the other adduct could not be removed by reducing agents and has not been identified. The N-termini of the inhibitor proteins were blocked by an acetylated Ala, but the complete amino acid sequence analysis of the deblocked inhibitors by Edman degradation of fragments from endopeptidase C digestion and mass spectrometry confirmed that the three isoinhibitors were derived from a single protein. The amino acid sequence of the protein differed at two positions from the sequence deduced from a cDNA reported in GenBank. The gene coding for the inhibitor is wound-inducible and is a member of the potato inhibitor I family of protease inhibitors. The inhibitor strongly inhibited subtilisin A, weakly inhibited trypsin, and did not inhibit chymotrypsin. The amino acid residues at the reactive site P(1) and P(1)(') were determined to be Gln45 and Asp46, respectively, residues rarely reported at the reactive site in potato inhibitor I family members. Comparison of amino acid sequences revealed that the HPI isoinhibitors shared from 33% to 55% identity (50-74% similarity) to inhibitors of the potato inhibitor I family. The properties of the isoinhibitors suggest that they may play a defensive role in the latex against pathogens and/or herbivores.  相似文献   

13.
14.
Structural genomics projects as well as ab initio protein structure prediction methods provide structures of proteins with no sequence or fold similarity to proteins with known functions. These are often low-resolution structures that may only include the positions of C alpha atoms. We present a fast and efficient method to predict DNA-binding proteins from just the amino acid sequences and low-resolution, C alpha-only protein models. The method uses the relative proportions of certain amino acids in the protein sequence, the asymmetry of the spatial distribution of certain other amino acids as well as the dipole moment of the molecule. These quantities are used in a linear formula, with coefficients derived from logistic regression performed on a training set, and DNA-binding is predicted based on whether the result is above a certain threshold. We show that the method is insensitive to errors in the atomic coordinates and provides correct predictions even on inaccurate protein models. We demonstrate that the method is capable of predicting proteins with novel binding site motifs and structures solved in an unbound state. The accuracy of our method is close to another, published method that uses all-atom structures, time-consuming calculations and information on conserved residues.  相似文献   

15.
M Streuli  N X Krueger  T Thai  M Tang    H Saito 《The EMBO journal》1990,9(8):2399-2407
Protein tyrosine phosphorylation is regulated by both protein tyrosine kinases and protein tyrosine phosphatases (PTPases). Recently, the structures of a family of PTPases have been described. In order to study the structure-function relationships of receptor-linked PTPases, we analyzed the effects of deletion and point mutations within the cytoplasmic region of the receptor-linked PTPases, LCA and LAR. We show that the first of the two domains has enzyme activity by itself, and that one cysteine residue in the first domain of both LCA and LAR is absolutely required for activity. The second PTPase like domains do not have detectable catalytic activity using a variety of substrates, but sequences within the second domains influence substrate specificity. The functional significance of a stretch of 10 highly conserved amino acid residues surrounding the critical cysteine residue located in the first domain of LAR was assessed. At most positions, any substitution severely reduced enzyme activity, while missense mutations at the other positions tested could be tolerated to varying degrees depending on the amino acid substitution. It is suggested that this stretch of amino acids may be part of the catalytic center of PTPases.  相似文献   

16.
Pharmacogenomics is the study of the genetic basis for individual variation in response to drugs and other xenobiotics. Successful prediction of effects of genetic variations that change encoded amino acid sequences on protein function and their consequent biomedical implications depends on three-dimensional (3D) structures of the encoded amino acid sequences. To bridge sequence to function, thus facilitating an in-depth pharmacogenomic study, we tested the feasibility of the use of a semi-computational approach to predict 3D structures of rabbit and human indolethylamine N-methyltransferases (INMTs) from their amino acid sequences, which share less than 26% sequence identity with known protein 3D structures. Herein, we report 3D models of INMTs predicted by using the crystal structure of rat catechol O-methyltransferase as a template, testing of the models both computationally and experimentally, and successful use of the models in retrospective prediction of the effects of genetic polymorphisms and in identification of residues that contribute to observed species-specific differences in substrate affinity. The results encourage the use of the semi-computational approach to predict 3D protein structures for use in pharmacogenomic studies when de novo prediction of protein 3D structures from their amino acid sequences is still not feasible and X-ray crystallography and/or solution nuclear magnetic resonance spectroscopy can only determine 3D structures for a small number of known amino acid sequences.Electronic Supplementary Material available.  相似文献   

17.
A novel sequence-analysis technique for detecting correlated amino acid positions in intermediate-size protein families (50-100 sequences) was developed, and applied to study voltage-dependent gating of potassium channels. Most contemporary methods for detecting amino acid correlations within proteins use very large sets of data, typically comprising hundreds or thousands of evolutionarily related sequences, to overcome the relatively low signal-to-noise ratio in the analysis of co-variations between pairs of amino acid positions. Such methods are impractical for voltage-gated potassium (Kv) channels and for many other protein families that have not yet been sequenced to that extent. Here, we used a phylogenetic reconstruction of paralogous Kv channels to follow the evolutionary history of every pair of amino acid positions within this family, thus increasing detection accuracy of correlated amino acids relative to contemporary methods. In addition, we used a bootstrapping procedure to eliminate correlations that were statistically insignificant. These and other measures allowed us to increase the method's sensitivity, and opened the way to reliable identification of correlated positions even in intermediate-size protein families. Principal-component analysis applied to the set of correlated amino acid positions in Kv channels detected a network of inter-correlated residues, a large fraction of which were identified as gating-sensitive upon mutation. Mapping the network of correlated residues onto the 3D structure of the Kv channel from Aeropyrum pernix disclosed correlations between residues in the voltage-sensor paddle and the pore region, including regions that are involved in the gating transition. We discuss these findings with respect to the evolutionary constraints acting on the channel's various domains. The software is available on our website  相似文献   

18.
Piv, a site-specific invertase from Moraxella lacunata, exhibits amino acid homology with the transposases of the IS110/IS492 family of insertion elements. The functions of conserved amino acid motifs that define this novel family of both transposases and site-specific recombinases (Piv/MooV family) were examined by mutagenesis of fully conserved amino acids within each motif in Piv. All Piv mutants altered in conserved residues were defective for in vivo inversion of the M. lacunata invertible DNA segment, but competent for in vivo binding to Piv DNA recognition sequences. Although the primary amino acid sequences of the Piv/MooV recombinases do not contain a conserved DDE motif, which defines the retroviral integrase/transposase (IN/Tnps) family, the predicted secondary structural elements of Piv align well with those of the IN/Tnps for which crystal structures have been determined. Molecular modelling of Piv based on these alignments predicts that E59, conserved as either E or D in the Piv/MooV family, forms a catalytic pocket with the conserved D9 and D101 residues. Analysis of Piv E59G confirms a role for E59 in catalysis of inversion. These results suggest that Piv and the related IS110/IS492 transposases mediate DNA recombination by a common mechanism involving a catalytic DED or DDD motif.  相似文献   

19.
Cloned cDNAs encoding both subunits of Drosophila melanogaster casein kinase II have been isolated by immunological screening of lambda gt11 expression libraries, and the complete amino acid sequence of both polypeptides has been deduced by DNA sequencing. The alpha cDNA contained an open reading frame of 336 amino acid residues, yielding a predicted molecular weight for the alpha polypeptide of 39,833. The alpha sequence contained the expected semi-invariant residues present in the catalytic domain of previously sequenced protein kinases, confirming that it is the catalytic subunit of the enzyme. Pairwise homology comparisons between the alpha sequence and the sequences of a variety of vertebrate protein kinase suggested that casein kinase II is a distantly related member of the protein kinase family. The beta subunit was derived from an open reading frame of 215 amino acid residues and was predicted to have a molecular weight of 24,700. The beta subunit exhibited no extensive homology to other proteins whose sequences are currently known.  相似文献   

20.
The molecular recognition and discrimination of very similar ligand moieties by proteins are important subjects in protein–ligand interaction studies. Specificity in the recognition of molecules is determined by the arrangement of protein and ligand atoms in space. The three pyrimidine bases, viz. cytosine, thymine, and uracil, are structurally similar, but the proteins that bind to them are able to discriminate them and form interactions. Since nonbonded interactions are responsible for molecular recognition processes in biological systems, our work attempts to understand some of the underlying principles of such recognition of pyrimidine molecular structures by proteins. The preferences of the amino acid residues to contact the pyrimidine bases in terms of nonbonded interactions; amino acid residue–ligand atom preferences; main chain and side chain atom contributions of amino acid residues; and solvent-accessible surface area of ligand atoms when forming complexes are analyzed. Our analysis shows that the amino acid residues, tyrosine and phenyl alanine, are highly involved in the pyrimidine interactions. Arginine prefers contacts with the cytosine base. The similarities and differences that exist between the interactions of the amino acid residues with each of the three pyrimidine base atoms in our analysis provide insights that can be exploited in designing specific inhibitors competitive to the ligands.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号