首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A novel alignment-free method for computing functional similarity of membrane proteins based on features of hydropathy distribution is presented. The features of hydropathy distribution are used to represent protein families as hydropathy profiles. The profiles statistically summarize the hydropathy distribution of member proteins. The summation is made by using hydropathy features that numerically represent structurally/functionally significant portions of protein sequences. The hydropathy profiles are numerical vectors that are points in a high dimensional ‘hydropathy’ space. Their similarities are identified by projection of the space onto principal axes. Here, the approach is applied to the secondary transporters. The analysis using the presented approach is validated by the standard classification of the secondary transporters. The presented analysis allows for prediction of function attributes for proteins of uncharacterized families of secondary transporters. The results obtained using the presented analysis may help to characterize unknown function attributes of secondary transporters. They also show that analysis of hydropathy distribution can be used for function prediction of membrane proteins.  相似文献   

2.
Distribution and complementarity of hydropathy in multisubunit proteins   总被引:7,自引:0,他引:7  
A P Korn  R M Burnett 《Proteins》1991,9(1):37-55
A survey of 40 multisubunit proteins and 2 protein-protein complexes was performed to assay quantitatively the distribution of hydropathy among the exterior surface, interior, contact surface, and noncontact exterior surface of the isolated subunits. We suggest a useful way to present this distribution by using a "hydropathy level diagram." Additionally, we have devised a function called "hydropathy complementarity" to quantitate the degree to which interacting surfaces have matching hydropathy distributions. Our survey revealed the following patterns: (1) The difference in hydropathy between the interior and exterior of subunits is a fairly invariant quantity. (2) On average, the hydropathy of the contact surface is higher than that of the exterior surface, but is not greater than that of the protein as a whole. There was variation, however, among the proteins. In some instances, the contact surface was more hydrophilic than the noncontact exterior, and in a few cases the contact surface was as hydrophobic as the protein interior. (3) The average interface manifests significant hydropathy complementarity, signifying that proteins interact by placing hydrophobic centers of one surface against hydrophobic centers of the other surface, and by similarly matching hydrophilic centers. As a measure of recognition and specificity, hydropathy complementarity could be a useful tool for predicting correct docking of interacting proteins. We suggest that high hydropathy complementarity is associated with static inflexible interactions. (4) We have found that some subunits that bind predominantly through hydrophilic forces, such as hydrogen bonds, ionic pairs, and water and metal bridges, are involved in dynamic quaternary organization and allostery.  相似文献   

3.
A novel alignment-free method for computing functional similarity of membrane proteins based on features of hydropathy distribution is presented. The features of hydropathy distribution are used to represent protein families as hydropathy profiles. The profiles statistically summarize the hydropathy distribution of member proteins. The summation is made by using hydropathy features that numerically represent structurally/functionally significant portions of protein sequences. The hydropathy profiles are numerical vectors that are points in a high dimensional 'hydropathy' space. Their similarities are identified by projection of the space onto principal axes. Here, the approach is applied to the secondary transporters. The analysis using the presented approach is validated by the standard classification of the secondary transporters. The presented analysis allows for prediction of function attributes for proteins of uncharacterized families of secondary transporters. The results obtained using the presented analysis may help to characterize unknown function attributes of secondary transporters. They also show that analysis of hydropathy distribution can be used for function prediction of membrane proteins.  相似文献   

4.
In post-genomic era, a plethora of protein structures have been solved but the functions of some of them are unknown. In this context, the role of hydropathy index of amino acids in predicting the function of a structurally known and functionally unknown protein was explored. Initially serine protease class was taken for analysis. Various methodologies like calculation of average hydropathy index for a five-residue window of a given sequence, hydropathy cluster analyses, etc., were done. Among these, the distribution of hydropathy clusters seems to suggest that the location of these clusters is conserved for a given class of proteins. Hence, this methodology was extended to different classes of proteins and to a protein with unknown function.  相似文献   

5.
A classification scheme for membrane proteins is proposed that clusters families of proteins into structural classes based on hydropathy profile analysis. The averaged hydropathy profiles of protein families are taken as fingerprints of the 3D structure of the proteins and, therefore, are able to detect more distant evolutionary relationships than amino acid sequences. A procedure was developed in which hydropathy profile analysis is used initially as a filter in a BLAST search of the NCBI protein database. The strength of the procedure is demonstrated by the classification of 29 families of secondary transporters into a single structural class, termed ST[3]. An exhaustive search of the database revealed that the 29 families contain 568 unique sequences. The proteins are predominantly from prokaryotic origin and most of the characterized transporters in ST[3] transport organic and inorganic anions and a smaller number are Na(+)/H(+) antiporters. All modes of energy coupling (symport, antiport, uniport) are found in structural class ST[3]. The relevance of the classification for structure/function prediction of uncharacterised transporters in the class is discussed.  相似文献   

6.
Prediction of the subcellular location of apoptosis proteins   总被引:4,自引:0,他引:4  
Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. Based on the concept that the subcellular location of an apoptosis protein is mainly determined by its amino acid sequence, a new algorithm for prediction of the subcellular location of an apoptosis protein is proposed. By using of a distinctive set of information parameters derived from the primary sequence of 317 apoptosis proteins, the increment of diversity (ID), the sole prediction parameter, is calculated. The higher predictive success rates than the previous other algorithms is obtained by the jackknife tests using the expanded dataset. Our prediction results show that the local compositions of twin amino acids and hydropathy distribution are very useful to predict subcellular location of protein.  相似文献   

7.
Punta M  Maritan A 《Proteins》2003,50(1):114-121
In this article, a membrane-propensity scale for amino acids is derived using only two ingredients: (i) a set of transmembrane helices segments from membrane protein crystal structures and (ii) the request that each component of the set has a free energy lower than that of a typical soluble protein sequence of the same length. Although the most widely used hydropathy scales satisfy this request, we use an optimization procedure that allows for extraction of an optimal scale, which correlates equally well with those scales. We show that, if the choice of the sequence database is accurate, significant knowledge-based scales, which are robust with respect to changes in the learning set, can be easily derived. The obtained scales can be used for transmembrane helices prediction. The predictive power of one of these scales is tested on membrane proteins, soluble proteins, and signal peptides databases, finding that its performances is comparable with those of the hydropathy scales.  相似文献   

8.
Summary The cDNA and/or genomic DNA sequences of 13 globulin storage proteins from flowering plants (angiosperms) are now known. They represent 8 genera, 5 families and 5 orders of plants and include one monocotyledonous species. Here, the coding nucleotide and amino acid sequences of these proteins are compared by dot matrix analysis and gross protein domains visualized by hydropathy analyses. The vestigial homologies visualized by these means indicate that all of the globulin storage proteins of flowering plants have emanated from 2 genes that existed at the beginning of angiosperm evolution.A curious polypeptide domain of 150–200 amino acids located near the N terminus is found in a globulin subgroup of 2 genera widely separated phylogenetically. The domain appears to have resulted from an ancient insertion that has been deleted in most of its descendant genes.  相似文献   

9.
嗜热与嗜常温微生物的蛋白质氨基酸组成比较   总被引:11,自引:0,他引:11  
嗜热微生物的嗜热特性与其蛋白质的高度热稳定性紧密相关。为了探索嗜热蛋白质的热稳定机制,比较嗜热和嗜常温微生物的蛋白质在氨基酸组成上的差别,收集110对分别来自嗜热和嗜常温微生物的同源蛋白质序列,比较两组蛋白质各种氨基酸含量以及疏水性氨基酸组成、疏水性指数和荷电氨基酸组成的差别,结果两者在多种氨基酸含量上存在微小但统计学上显著的差别,嗜热蛋白质比嗜常温蛋白质具有较高的平均疏水性和荷电氨基酸组成。对两组蛋白质的“脂肪族氨基酸指数”进行分析,证明嗜热蛋白质之所以具有较高的脂肪族氨基酸指数是由于其亮氨酸含量较高,与影响该指数的其它几种氨基酸无关;从而认为该指数的意义值得怀疑。通过对大量同源嗜热蛋白质和嗜常温蛋白质氨基酸组成的比较,能够揭示一些有关蛋白质热稳定性的普遍规律。  相似文献   

10.
The sodium solute symporters (SSS) and neurotransmitter sodium symporters (NSS) are two families of secondary transporters that are not related in amino acid sequence. Nonetheless, recent crystal structures showed that the Na+/galactose (SSS) and Na+/leucine (NSS) transporters have similar core structures. The structural relatedness highlights the need for classification methods for membrane protein structures based on other criteria than amino acid similarity. Here, we demonstrate that a method based on hydropathy profile alignments convincingly identifies structural similarity between the NSS and SSS families. Most importantly, the method shows that one of the largest transporter families for which a crystal structure is elusive (the amino acid/polyamine/organocation or APC superfamily), also shares the similar core structure observed for the Na+/galactose and Na+/leucine transporters. The APC superfamily contains the major amino acid transporter families that are found throughout life. Insight into their structure will significantly facilitate the studies of this important group of transporters.  相似文献   

11.
12.
13.
A structural class in the MemGen classification of membrane proteins is a set of evolutionary related proteins sharing a similar global fold. A structural class contains both closely related pairs of proteins for which homology is clear from sequence comparison and very distantly related pairs, for which it is not possible to establish homology based on sequence similarity alone. In the latter case the evolutionary link is based on hydropathy profile analysis. Here, we use these evolutionary related sets of proteins to analyze the relationship between E-values in BLAST searches, sequence similarities in multiple sequence alignments and structural similarities in hydropathy profile analyses. Two structural classes of secondary transporters termed ST[3], which includes the Ion Transporter (IT) superfamily and ST[4], which includes the DAACS family (TC# 2.A.23) were extracted from the NCBI protein database. ST[3] contains 2051 unique sequences distributed over 32 families and 59 subfamilies. ST[4] is a smaller class containing 399 unique sequences distributed over 2 families and 7 subfamilies. One subfamily in ST[4] contains a new class of binding protein dependent secondary transporters. Comparison of the averaged hydropathy profiles of the subfamilies in ST[3] and ST[4] revealed that the two classes represent different folds. Divergence of the sequences in ST[4] is much smaller than observed in ST[3], suggesting different constraints on the proteins during evolution. Analysis of the correlation between the evolutionary relationship of pairs of proteins in a class and the BLAST E-value revealed that: (i) the BLAST algorithm is unable to pick up the majority of the links between proteins in structural class ST[3], (ii) ‘low complexity filtering’ and ‘composition based statistics’ improve the specificity, but strongly reduce the sensitivity of BLAST searches for distantly related proteins, indicating that these filters are too stringent for the proteins analyzed, and (iii) the E-value cut-off, which may be used to evaluate evolutionary significance of a hit in a BLAST search is very different for the two structural classes of membrane proteins.  相似文献   

14.
Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.  相似文献   

15.
Bastolla U  Porto M  Roman HE  Vendruscolo M 《Gene》2005,347(2):219-230
We review and further develop an analytical model that describes how thermodynamic constraints on the stability of the native state influence protein evolution in a site-specific manner. To this end, we represent both protein sequences and protein structures as vectors: structures are represented by the principal eigenvector (PE) of the protein contact matrix, a quantity that resembles closely the effective connectivity of each site; sequences are represented through the "interactivity" of each amino acid type, using novel parameters that are correlated with hydropathy scales. These interactivity parameters are more strongly correlated than the other hydropathy scales that we examine with: (1) the change upon mutations of the unfolding free energy of proteins with two-states thermodynamics; (2) genomic properties as the genome-size and the genome-wide GC content; (3) the main eigenvectors of the substitution matrices. The evolutionary average of the interactivity vector correlates very strongly with the PE of a protein structure. Using this result, we derive an analytic expression for site-specific distributions of amino acids across protein families in the form of Boltzmann distributions whose "inverse temperature" is a function of the PE component. We show that our predictions are in agreement with site-specific amino acid distributions obtained from the Protein Data Bank, and we determine the mutational model that best fits the observed site-specific amino acid distributions. Interestingly, the optimal model almost minimizes the rate at which deleterious mutations are eliminated by natural selection.  相似文献   

16.
Many strains of Streptococcus pyogenes are known to express a receptor for IgA. The complete nucleotide sequence of the gene for such a receptor, protein Arp4, has been determined. The deduced amino acid sequence of 386 residues includes a signal sequence of 41 amino acids and a putative membrane anchor region, both of which are homologous to similar regions in other streptococcal surface proteins. The processed form of the IgA receptor has a length of 345 amino acids and a calculated molecular weight of 39544. The N-terminal sequence of the processed form is different from that previously found for a similar IgA receptor isolated from a S. pyogenes strain of type M60. The sequence of protein Arp4 shows extensive homology to the C-terminal half of streptococcal M proteins, but not to the streptococcal IgG receptor protein G or staphlyococcal protein A. Apart from the membrane anchor, this homology includes a sequence of 119 amino acid residues containing three repeated units and a 54-residue sequence without repeats. The protein expressed in Escherichia coli is found in the periplasmic space, in which it constitutes the major protein. Protein Arp4 is the first example of a surface protein that has both immunoglobulin-binding capacity and structural features characteristic of M proteins.  相似文献   

17.
We present a new method for predicting the secondary structure of globular proteins based on non-linear neural network models. Network models learn from existing protein structures how to predict the secondary structure of local sequences of amino acids. The average success rate of our method on a testing set of proteins non-homologous with the corresponding training set was 64.3% on three types of secondary structure (alpha-helix, beta-sheet, and coil), with correlation coefficients of C alpha = 0.41, C beta = 0.31 and Ccoil = 0.41. These quality indices are all higher than those of previous methods. The prediction accuracy for the first 25 residues of the N-terminal sequence was significantly better. We conclude from computational experiments on real and artificial structures that no method based solely on local information in the protein sequence is likely to produce significantly better results for non-homologous proteins. The performance of our method of homologous proteins is much better than for non-homologous proteins, but is not as good as simply assuming that homologous sequences have identical structures.  相似文献   

18.
We report here the cloning and sequence analysis of cDNAs for a pair of closely related proteins from soybean (Glycine max [L.] Merr. cv. Williams 82) stems. Both proteins are abundant in soluble extracts of seedling stems but not of roots. One of these proteins (M r=28 kDa) is also foundd in the cell wall fraction of stems and actumulates there when seedlings are exposed to mild water deficit for 48 h. The mRNA for these proteins is most abundant in the stem region which contains dividing cells, less abundant in elongating and mature stem cells, and rare in roots. Using antiserum against the 28 kDa protein, we isolated cDNA clones encoding it and an antigenically related 31 kDa protein. The two cDNAs are 80% homologous in nucleotide and amino acid coding sequence. The predicted proteins have similar hydropathy profiles, and contain putative NH2-terminal signal sequences and a single putative N-linked glycosylation site. The two proteins differ significantly in calculated pI (28 kDa=8.6; 31 kDa=5.8), and the charge difference is demonstrated on two-dimensional gels. The proteins described here may function as somatic storage proteins during early seedling development, and are closely related to glycoproteins which accumulate in vacuoles of paraveinal mesophyll cells of fully expanded soybean leaves when plants are depodded.  相似文献   

19.
In a previous paper we obtained ten (orthogonal) factors, linear combinations of which can express the properties of the 20 naturally occurring amino acids. In this paper, we assume that the most important properties (linear combinations of these ten factors) that determine the three-dimensional structure of a protein are conserved properties, i.e., are those that have been conserved during evolution. Two definitions of a conserved property are presented: (1) a conserved property for an average protein is defined as that linear combination of the ten factors that optimally expresses the similarity of one amino acid to another (hence, little change during evolution), as given by the relatedness odds matrix of Dayhoff et al.; (2) a conserved property for each position in the amino acid sequence (locus) of a specific family of homologous proteins (the cytochromec family or the globin family) is defined as that linear combination of the ten factors that is common among a set of amino acids at a given locus when the sequences are properly aligned. When the specificity at each locus is averaged over all loci, the same features are observed for three expressions of these two definitions, namely the conserved property for an average protein, the average conserved property for the cytochromec family, and the average conserved property for the globin family; we find that bulk and hydrophobicity (information about packing and long-range interactions) are more important than other properties, such as the preference for adopting a specific backbone structure (information about short-range interactions). We also demonstrate that the sequence profile of a conserved property, defined for each locus of a protein family (definition 2), corresponds uniquely to the three-dimensional structure, while the conserved property for an average protein (definition 1) is not useful for the prediction of protein structure. The amino acid sequences of numerous proteins are searched to find those that are similar, in terms of the conserved properties (definition 2), to sequences of the same size from one of the homologous families (cytochromec and globin, respectively) for whose loci the conserved properties were defined. Many similar sequences are found, the number of similarities decreasing with increasing size of the segment. However, the segments must be rather long (15 residues) before the comparisons become meaningful. As an example, one sufficiently large sequence (20 residues) from a protein of known structure (apo-liver alcohol dehydrogenase that is not a member of either family) is found to be similar in the conserved properties to a particular sequence of a member of the family of human hemoglobin chains, and the two sequences have similar structures. This means that, since conserved properties are expected to be structure determinants, we can use the conserved properties to predict an initial protein structure for subsequent energy minimization for a protein for which the conserved properties are similar to those of a family of proteins with a sufficiently large number of homologous amino acid sequences; such a large number of homologous sequences is required to define a conserved property for each locus of the homologous protein family.  相似文献   

20.
Many outer membrane proteins (OMPs) in Gram-negative bacteria possess known beta-barrel three-dimensional (3D) structures. These proteins, including channel-forming transmembrane porins, are diverse in sequence but exhibit common structural features. We here report computational analyses of six outer membrane proteins of known 3D structures with respect to (1) secondary structure, (2) hydropathy, and (3) amphipathicity. Using these characteristics, as well as the presence of an N-terminal targeting sequence, a program was developed allowing prediction of integral membrane beta-barrel proteins encoded within any completely sequenced prokaryotic genome. This program, termed the beta-barrel finder (BBF) program, was used to analyze the proteins encoded within the Escherichia coli genome. Out of 4290 sequences examined, 118 (2.8%) were retrieved. Of these, almost all known outer membrane proteins with established beta-barrel structures as well as many probable outer membrane proteins were identified. This program should be useful for predicting the occurrence of outer membrane proteins in bacteria with completely sequenced genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号