首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The presence in proteins of amino acid residues that change in concert during evolution is associated with keeping constant the protein spatial structure and functions. As in the case with morphological features, correlated substitutions may become the cause of homoplasies--the independent evolution of identical non-homological adaptations. Our data obtained on model phylogenetic trees and corresponding sets of sequences have shown that the presence of correlated substitutions distorts the results of phylogenetic reconstructions. A method for accounting for co-evolving amino acid residues in phylogenetic analysis is proposed. According to this method, only a single site from the group of correlated amino acid positions should remain, whereas other positions should not be used in further phylogenetic analysis. Simulations performed have shown that replacement on the average of 8% of variable positions in a pair of model sequences by coordinately evolving amino acid residues is able to change the tree topology. The removal of such amino acid residues from sequences before phylogenetic analysis restores the correct topology.  相似文献   

2.
Theras-oncogene-encoded p21 protein becomes oncogenic if amino acid substitutions occur at critical positions in the polypeptide chain. The most commonly found oncogenic forms contain Val in place of Gly 12 or Leu in place of Gln 61. To determine the effects of these substitutions on the three-dimensional structure of the whole p21 protein, we have performed molecular dynamics calculations on each of these three proteins bound to GDP and magnesium ion to compute the average structures of each of the three forms. Comparisons of the computed average structures shows that both oncogenic forms with Val 12 and Leu 61 differ substantially in structure from that of the wild type (containing Gly 12 and Gln 61) in discrete regions: residues 10–16, 32–47, 55–74, 85–89, 100–110, and 119–134. All of these regions occur in exposed loops, and several of them have already been found to be involved in the cellular functioning of the p21 protein. These regions have also previously been identified as the most flexible domains of the wild-type protein and have been bound to be the same ones that differ in conformation between transforming and nontransforming p21 mutant proteins neither of which binds nucleotide. The two oncogenic forms have similar conformations in their carboxyl-terminal domains, but differ in conformation at residues 32–47 and 55–74. The former region is known to be involved in the interaction with at least three downstream effector target proteins. Thus, differences in structure between the two oncogenic proteins may reflect different relative affinities of each oncogenic protein for each of these effector targets. The latter region, 55–74, is known to be a highly mobile segment of the protein. The results strongly suggest that critical oncogenic amino acid substitutions in the p21 protein cause changes in the structures of vital domains of this protein.  相似文献   

3.
Halophilic (literally salt-loving) archaea are a highly evolved group of organisms that are uniquely able to survive in and exploit hypersaline environments. In this review, we examine the potential interplay between fluctuations in environmental salinity and the primary sequence and tertiary structure of halophilic proteins. The proteins of halophilic archaea are highly adapted and magnificently engineered to function in an intracellular milieu that is in ionic balance with an external environment containing between 2 and 5 M inorganic salt. To understand the nature of halophilic adaptation and to visualize this interplay, the sequences of genes encoding the L11, L1, L10, and L12 proteins of the large ribosome subunit and Mn/Fe superoxide dismutase proteins from three genera of halophilic archaea have been aligned and analyzed for the presence of synonymous and nonsynonymous nucleotide substitutions. Compared to homologous eubacterial genes, these halophilic genes exhibit an inordinately high proportion of nonsynonymous nucleotide substitutions that result in amino acid replacement in the encoded proteins. More than one-third of the replacements involve acidic amino acid residues. We suggest that fluctuations in environmental salinity provide the driving force for fixation of the excessive number of nonsynonymous substitutions. Tinkering with the number, location, and arrangement of acidic and other amino acid residues influences the fitness (i.e., hydrophobicity, surface hydration, and structural stability) of the halophilic protein. Tinkering is also evident at halophilic protein positions monomorphic or polymorphic for serine; more than one-third of these positions use both the TCN and the AGY serine codons, indicating that there have been multiple nonsynonymous substitutions at these positions. Our model suggests that fluctuating environmental salinity prevents optimization of fitness for many halophilic proteins and helps to explain the unusual evolutionary divergence of their encoding genes.  相似文献   

4.
Automatic methods for predicting functionally important residues   总被引:9,自引:0,他引:9  
Sequence analysis is often the first guide for the prediction of residues in a protein family that may have functional significance. A few methods have been proposed which use the division of protein families into subfamilies in the search for those positions that could have some functional significance for the whole family, but at the same time which exhibit the specificity of each subfamily ("Tree-determinant residues"). However, there are still many unsolved questions like the best division of a protein family into subfamilies, or the accurate detection of sequence variation patterns characteristic of different subfamilies. Here we present a systematic study in a significant number of protein families, testing the statistical meaning of the Tree-determinant residues predicted by three different methods that represent the range of available approaches. The first method takes as a starting point a phylogenetic representation of a protein family and, following the principle of Relative Entropy from Information Theory, automatically searches for the optimal division of the family into subfamilies. The second method looks for positions whose mutational behavior is reminiscent of the mutational behavior of the full-length proteins, by directly comparing the corresponding distance matrices. The third method is an automation of the analysis of distribution of sequences and amino acid positions in the corresponding multidimensional spaces using a vector-based principal component analysis. These three methods have been tested on two non-redundant lists of protein families: one composed by proteins that bind a variety of ligand groups, and the other composed by proteins with annotated functionally relevant sites. In most cases, the residues predicted by the three methods show a clear tendency to be close to bound ligands of biological relevance and to those amino acids described as participants in key aspects of protein function. These three automatic methods provide a wide range of possibilities for biologists to analyze their families of interest, in a similar way to the one presented here for the family of proteins related with ras-p21.  相似文献   

5.
The immune response to beta-(1,6)-galactan in the BALB/c mouse has been well characterized and includes the amino acid sequence determination of 13 monoclonal antibodies. The genetic potential encoding the VH regions of these antibodies has been determined by isolation and sequencing of homologous germline genes. The germline repertoire encoding these proteins was found to consist of two closely related genes. One of these directly encodes the VH segments of seven Gal-binding proteins, and the second directly encodes one additional protein sequence. Sequence variations found in the VH regions of five other Gal-binding proteins can be explained by somatic mutations leading to single base substitutions in the more frequently used gene. Since four of the hybridoma proteins exhibiting somatic mutations are of the IgM class, these results indicate that somatic mutation, in this system, is not associated with class switching and can apparently be initiated early in B-cell development. The two Gal genes are the only members of a very restricted multigene family and probably result from a gene duplication estimated to occur 1.4-2.8 million years ago. Three other genes hybridizing at moderate stringency to a VHGal probe were also sequenced and were found to be members of two additional VHIII families. Studies of the silent to replacement substitution ratios of these and other VH genes indicate that the number of silent substitutions found in immunoglobulin VH genes is lower than expected when compared with proteins such as preproinsulin and globin. Analysis of base composition reflected in these sequences indicates a marked increase of A-T% in the first and second codon positions of complementarity determining regions (CDR) which may be important in facilitating point mutations.  相似文献   

6.
Two different states of human immunodeficiency virus type 1 are apparent in the asymptomatic and late stages of infection. Important determinants associated with these two states have been found within the V3 loop of the viral Env protein. In this study, two large data sets of published V3 sequences were analyzed to identify patterns of sequence variability that would correspond to these two states of the virus. We were especially interested in the pattern of basic amino acid substitutions, since the presence of basic amino acids in V3 has been shown to change virus tropism in cell culture. Four features of the sequence heterogeneity in V3 were observed: (i) approximately 70% of all nonconservative basic substitutions occur at four positions in V3, and V3 sequences with a basic substitution in at least one of these four positions contain approximately 95% of all nonconservative basic substitutions; (ii) substitution patterns within V3 are influenced by the identity of the amino acid at position 25; (iii) sequence polymorphisms account for a significant fraction of uncharged amino acid substitutions at several positions in V3, and sequence heterogeneity other than these polymorphisms is most significant at two positions near the tip of V3; and (iv) sequence heterogeneity in V3 (in addition to the basic amino acid substitutions) is approximately twofold greater in V3 sequences that contain basic amino acid substitutions. By using this sequence analysis, we were able to identify distinct groups of V3 sequences in infected patients that appear to correspond to these two virus states. The identification of these discrete sequence patterns in vivo demonstrates how the V3 sequence can be used as a genetic marker for studying the two states of human immunodeficiency virus type 1.  相似文献   

7.
A novel sequence-analysis technique for detecting correlated amino acid positions in intermediate-size protein families (50-100 sequences) was developed, and applied to study voltage-dependent gating of potassium channels. Most contemporary methods for detecting amino acid correlations within proteins use very large sets of data, typically comprising hundreds or thousands of evolutionarily related sequences, to overcome the relatively low signal-to-noise ratio in the analysis of co-variations between pairs of amino acid positions. Such methods are impractical for voltage-gated potassium (Kv) channels and for many other protein families that have not yet been sequenced to that extent. Here, we used a phylogenetic reconstruction of paralogous Kv channels to follow the evolutionary history of every pair of amino acid positions within this family, thus increasing detection accuracy of correlated amino acids relative to contemporary methods. In addition, we used a bootstrapping procedure to eliminate correlations that were statistically insignificant. These and other measures allowed us to increase the method's sensitivity, and opened the way to reliable identification of correlated positions even in intermediate-size protein families. Principal-component analysis applied to the set of correlated amino acid positions in Kv channels detected a network of inter-correlated residues, a large fraction of which were identified as gating-sensitive upon mutation. Mapping the network of correlated residues onto the 3D structure of the Kv channel from Aeropyrum pernix disclosed correlations between residues in the voltage-sensor paddle and the pore region, including regions that are involved in the gating transition. We discuss these findings with respect to the evolutionary constraints acting on the channel's various domains. The software is available on our website  相似文献   

8.
MOTIVATION: It is known that the physico-chemical characteristics of proteins underlying specific folding of the polypeptide chain and the protein function are evolutionary conserved. Detection of such characteristics while analyzing homologous sequences would expand essentially the knowledge on protein function, structure, and evolution. These characteristics are maintained constant, in particular, by co-ordinated substitutions. In this process, the destabilizing effect of a substitution may be compensated by another substitution at a different position within the same protein, making the overall change in this protein characteristic insignificant. Consequently, the patterns of co-ordinated substitutions contain important information on conserved physico-chemical properties of proteins, requiring their investigation and development of the corresponding methods and software for correlation analysis of protein sequences available to a wide range of users. RESULTS: A software package for analyzing correlated amino acid substitutions at different positions within aligned protein sequences was developed. The approach implies searching for evolutionary conserved physico-chemical characteristics of proteins based on the information on the pairwise correlations of amino acid substitutions at different protein positions. The software was applied to analyze DNA-binding domains of the homeodomain class. As a result, two conservative physico-chemical characteristics preserved due to the co-ordinated substitutions at certain groups of positions in the protein sequence. Possible functional roles of these characteristics are discussed. AVAILABILITY: The program package is available at http://wwwmgs.bionet.nsc.ru/programs/CRASP/.  相似文献   

9.
Theras-oncogene-encoded p21 protein becomes oncogenic if amino acid substitutions occur at critical positions in the polypeptide chain. The most commonly found oncogenic forms contain Val in place of Gly 12 or Leu in place of Gln 61. To determine the effects of these substitutions on the three-dimensional structure of the whole p21 protein, we have performed molecular dynamics calculations on each of these three proteins bound to GDP and magnesium ion to compute the average structures of each of the three forms. Comparisons of the computed average structures shows that both oncogenic forms with Val 12 and Leu 61 differ substantially in structure from that of the wild type (containing Gly 12 and Gln 61) in discrete regions: residues 10–16, 32–47, 55–74, 85–89, 100–110, and 119–134. All of these regions occur in exposed loops, and several of them have already been found to be involved in the cellular functioning of the p21 protein. These regions have also previously been identified as the most flexible domains of the wild-type protein and have been bound to be the same ones that differ in conformation between transforming and nontransforming p21 mutant proteins neither of which binds nucleotide. The two oncogenic forms have similar conformations in their carboxyl-terminal domains, but differ in conformation at residues 32–47 and 55–74. The former region is known to be involved in the interaction with at least three downstream effector target proteins. Thus, differences in structure between the two oncogenic proteins may reflect different relative affinities of each oncogenic protein for each of these effector targets. The latter region, 55–74, is known to be a highly mobile segment of the protein. The results strongly suggest that critical oncogenic amino acid substitutions in the p21 protein cause changes in the structures of vital domains of this protein.  相似文献   

10.
The alpha-mating pheromone receptor encoded by the yeast STE2 gene is a G protein coupled receptor that initiates signaling via a MAP kinase pathway that prepares haploid cells for mating. To establish the range of allowed amino acid substitutions within transmembrane segments of this receptor, we conducted extensive random mutagenesis of receptors followed by screening for receptor function. A total of 157 amino acid positions in seven different mutagenic libraries corresponding to the seven predicted transmembrane segments were analyzed, yielding 390 alleles that retain at least 60 % of normal signaling function. These alleles contained a total of 576 unique amino acid substitutions, including 61 % of all the possible amino acid changes that can arise from single base substitutions. The receptor exhibits a surprising tolerance for amino acid substitutions. Every amino acid in the mutagenized regions of the transmembrane regions could be substituted by at least one other residue. Polar amino acids were tolerated in functional receptors at 115 different positions (73 % of the total). Hydrophobic amino acids were tolerated in functional receptors at all mutagenized positions. Substitutions introducing proline residues were recovered at 53 % of all positions where they could be brought about by single base changes. Residues with charged side-chains could also be tolerated at 53 % of all positions where they were accessible through single base changes. The spectrum of allowed amino acid substitutions was characterized in terms of the hydrophobicity, radius of gyration, and charge of the allowed substitutions and mapped onto alpha-helical structures. By comparing the patterns of allowed substitutions with the recently determined structure of rhodopsin, structural features indicative of helix-helix interactions can be discerned in spite of the extreme sequence divergence between these two proteins.  相似文献   

11.
A set of aligned homologous protein sequences is divided into two groups consisting of m and n sequences. Each group contains sequences from the most related organisms. Value of the position dissimilarity of proteins from different groups of m and n sequences is defined as a number of mismatches in comparison of all possible m X n pairs of amino acid residues in the position (each from different group) divided by m X n. Ten position average of dissimilarity values is plotted vs. the first position number. Area of the figure between the profile of dissimilarity values and its mean value line characterizes the overall irregularity of amino acid substitutions along the protein sequences. If the area is greater than the average area for 1000 random profiles by more than two standard deviation units, the profile extrema containing the "surplus" of area are cut off. The cut-off stretches are likely to be variable and constant regions. If necessary, each of stretches may be separately tested and statistically estimated using a standard size sample of artificial protein families. Intergroup comparison of protein sequences reveals high overall irregularity of amino acid substitutions and identifies variable and conservative regions for all considered families of proteins: phospholipases A2, aspartate aminotransferases, alpha-subunits of Na+, K(+)-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre, human rhodopsins.  相似文献   

12.
Many protein regions have been shown to be intrinsically disordered, lacking unique structure under physiological conditions. These intrinsically disordered regions are not only very common in proteomes, but also crucial to the function of many proteins, especially those involved in signaling, recognition, and regulation. The goal of this work was to identify the prevalence, characteristics, and functions of conserved disordered regions within protein domains and families. A database was created to store the amino acid sequences of nearly one million proteins and their domain matches from the InterPro database, a resource integrating eight different protein family and domain databases. Disorder prediction was performed on these protein sequences. Regions of sequence corresponding to domains were aligned using a multiple sequence alignment tool. From this initial information, regions of conserved predicted disorder were found within the domains. The methodology for this search consisted of finding regions of consecutive positions in the multiple sequence alignments in which a 90% or more of the sequences were predicted to be disordered. This procedure was constrained to find such regions of conserved disorder prediction that were at least 20 amino acids in length. The results of this work included 3,653 regions of conserved disorder prediction, found within 2,898 distinct InterPro entries. Most regions of conserved predicted disorder detected were short, with less than 10% of those found exceeding 30 residues in length.  相似文献   

13.
The protein sequences of seven members of the superoxide dismutase (SOD) family from halophilic archaebacteria have been aligned and compared with each other and with the homologous Mn and Fe SOD sequences from eubacteria and the methanogenic archaebacterium Methanobacterium thermoautotrophicum. Of 199 common residues in the SOD proteins from halophilic archaebacteria, 125 are conserved in all seven sequences, and 64 of these are encoded by single unique triplets. The 74 remaining positions exhibit a high degree of variability, and for almost half of these, the encoding triplets are connected by at least two nonsynonymous nucleotide substitutions. The majority of nucleotide substitutions within the seven genes are nonsynonymous and result in amino acid replacement in the respective protein; silent third-codon-position (synonymous) substitutions are unexpectedly rare. Halophilic SODs contain 30 specific residues that are not found at the corresponding positions of the methanogenic or eubacterial SOD proteins. Seven of these are replacements of highly conserved amino acids in eubacterial SODs that are believed to play an important role in the three-dimensional structure of the protein. Residues implicated in formation of the active site, catalysis, and metal ion binding are conserved in all Mn and Fe SODs. Molecular phylogenies based on parsimony and neighbor-joining methods coherently group the halophile sequences but surprisingly fail to distinguish between the Mn SOD of Escherichia coli and the Fe SOD of M. thermoautotrophicum as the outgroup. These comparisons indicate that as a group, the SODs of halophilic archaebacteria have many unique and characteristic features. At the same time, the patterns of nucleotide substitution and amino acid replacement indicate that these genes and the proteins that they encode continue to be subject to strong and changing selection. This selection may be related to the presence of oxygen radicals and the inter- and intracellular composition and concentration of metal cations.  相似文献   

14.
15.
We have developed a new method of detecting common spatial arrangements of backbone fragments in proteins. This method allows corresponding fragments to occur in a different order in respective amino acid sequences. We applied this method to detect structural similarities between an acid protease, endothiapepsin, and all other proteins in the protein data bank. Significant similarities were found not only with other acid proteases but also with virus proteases and with proteins having different functions. The possible biological meaning of these similarities is discussed.  相似文献   

16.
An algorithm is presented for localizing variable and constant regions in homologous protein sequences. A set of aligned protein sequences is divided into two groups consisting of m and n sequences. Each group contains sequences of most related species. Value of the position dissimilarity of proteins from different groups of m and n sequences is defined as a number of failures to coincide in comparison with all possible mXn pairs of amino acid residues in the position (each from different group) divided by mXn. The position dissimilarity value of m protein sequences within a group is defined as the number of failures to coincide in comparison with all possible mX X(m-1)/2 pairs of amino acid residues divided by mX(m-1)/2. Ten position average of dissimilarity values is plotted vs. the first position number. Area of the figure included between the profile of dissimilarity values and its mean value line characterizes the overall irregularity of amino acid substitutions along the protein sequences. If the area value is greater than the average area for 1000 random profile by more than two standard deviation units, the profile extrema containing the "surplus" of area are cut off. The cut off stretches are likely to be variable and constant regions. In case of "between groups" comparisons it is found that the overall irregularity of amino acid substitutions is very high for all considered families of proteins; phospholipases A2, aspartate aminotransferases, alpha-subunits of Na+,K(+)-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre, human rhodopsins.  相似文献   

17.
Study of structure/function relationships constitutes an important field of research, especially for modification of protein function and drug design. However, the fact that rational design (i.e. the modification of amino acid sequences by means of directed mutagenesis, based on knowledge of the three-dimensional structure) appears to be much less efficient than irrational design (i.e. random mutagenesis followed by in vitro selection) clearly indicates that we understand little about the relationships between primary sequence, three-dimensional structure and function. The use of evolutionary approaches and concepts will bring insights to this difficult question. The increasing availability of multigene family sequences that has resulted from genome projects has inspired the creation of novel in silico evolutionary methods to predict details of protein function in duplicated (paralogous) proteins. The underlying principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in paralogs. It has been proposed that the positions that show switches in substitution rate over time--i.e., 'heterotachous sites'--are good indicators of functional divergence. However, it appears that heterotachy is a much more general process, since most variable sites of homologous proteins with no evidence of functional shift are heterotachous. Similarly, it appears that switches in substitution rate are as frequent when paralogous sequences are compared as when orthologous sequences are compared. Heterotachy, instead of being indicative of functional shift, may more generally reflect a less specific process related to the many intra- and inter-molecular interactions compatible with a range of more or less equally viable protein conformations. These interactions will lead to different constraints on the nature of the primary sequences, consistently with theories suggesting the non-independence of substitutions in proteins. However, a specific type of amino acid variation might constitute a good indicator of functional divergence: substitutions occurring at positions that are generally slowly evolving. Such substitutions at constrained sites are indeed much more frequent soon after gene duplication. The identification and analysis of these sites by complementing structural information with evolutionary data may represent a promising direction to future studies dealing with the functional characterization of an ever increasing number of multi-gene families identified by complete genome analysis.  相似文献   

18.
T Palzkill  D Botstein 《Proteins》1992,14(1):29-44
A new analytical mutagenesis technique is described that involves randomizing the DNA sequence of a short stretch of a gene (3-6 codons) and determining the percentage of all possible random sequences that produce a functional protein. A low percentage of functional random sequences in a complete library of random substitutions indicates that the region mutagenized is important for the structure and/or function of the protein. Repeating the mutagenesis over many regions throughout a protein gives a global perspective of which amino acid sequences in a protein are critical. We applied this method to 66 codons of the gene encoding TEM-1 beta-lactamase in 19 separate experiments. We found that TEM-1 beta-lactamase is extremely tolerant of amino acid substitutions: on average, 44% of all mutants with random substitutions function and 20% of the substitutions are expressed, secreted, and fold well enough to function at levels similar to those for the wild-type enzyme. We also found a few exceptional regions where only a few random sequences function. Examination of the X-ray structures of homologous beta-lactamases indicates that the regions most sensitive to substitution are in the vicinity of the active site pocket or buried in the hydrophobic core of the protein. DNA sequence analysis of functional random sequences has been used to obtain more detailed information about the amino acid sequence requirements for several regions and this information has been compared to sequence conservation among several related beta-lactamases.  相似文献   

19.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

20.
KNOX homeodomain (HD) proteins encoded by KNOTTED1-like homeobox genes (KNOX genes) are considered to work as important regulators for plant developmental and morphogenetic events. We found that OSH3, one of the KNOX genes isolated from a cultivar of Oryza sativa (Nipponbare), encodes a novel HD, which has two amino acid substitutions at invariant positions. Sequence analysis of OSH3 from various domesticated and wild species of rice has revealed that these substitutions are distributed only in Japonica and Javanica type of O. sativa, two groups of domesticated rice in Asia. Surprisingly, nucleotide sequences in the first intron are almost conserved in the rice strains that have the substitutions at the invariant amino acids. Overexpression studies revealed that these invariant amino acids are critical for the function of OSH3 in vivo. The facts that these substitutions occurred specifically at the functionally important amino acids and the sequences are conserved in intron where neutral mutations accumulate suggest the substitutions at the invariant positions of OSH3 have been fixed by artificial selections during domestication. Based on these observations, we hypothesize that OSH3 is responsible for one of the traits that are selectively introduced during the domestication of most of Japonica and a part of Javanica type of rice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号