首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Evidence from diverse studies, such as protein design experiments and analysis of the emergence of drug resistance in human immunodeficiency virus 1 (HIV-1), indicates that protein function can be diminished or altered by mutations at positions distant from the classic 'functional' site. Furthermore, results from correlation analysis of the ligand-binding domain of nuclear receptors suggest that mutation events at positions distributed throughout a protein domain may be involved in functional diversification during the evolution of homologous domain families. This review explores potential applications for a protein design procedure based on correlated substitutions.  相似文献   

2.
The neutral theory of molecular evolution states that most mutations are deleterious or neutral. It results that the evolutionary rate of a given position in an alignment is a function of the level of constraint acting on this position. Inferring evolutionary rates from a set of aligned sequences is hence a powerful method to detect functionally and/or structurally important positions in a protein. Some positions, however, may be constrained while having a high substitution rate, providing these substitutions do not affect the biochemical property under constraint. Here, I introduce a new evolutionary rate measure accounting for the evolution of specific biochemical properties (e.g., volume, polarity, and charge). I then present a new statistical method based on the comparison of two rate measures: a site is said to be constrained for property X if it shows an unexpectedly high conservation of X knowing its total evolutionary rate. Compared to single-rate methods, the two-rate method offers several advantages: it (i) allows assessment of the significance of the constraint, (ii) provides information on the type of constraint acting on each position, and (iii) detects positions that are not proposed by previous methods. I apply this method to a 200-sequence data set of triosephosphate isomerase and report significant cases of positions constrained for polarity, volume, or charge. The three-dimensional localization of these positions shows that they are of potential interest to the molecular evolutionist and to the biochemist.  相似文献   

3.
Patrick Slama 《Proteins》2018,86(1):3-12
Residues at different positions of a multiple sequence alignment sometimes evolve together, due to a correlated structural or functional stress at these positions. Co‐evolution has thus been evidenced computationally in multiple proteins or protein domains. Here, we wish to study whether an evolutionary stress is exerted on a sequence alignment across protein domains, i.e., on longer sequence separations than within a single protein domain. JmjC‐containing lysine demethylases were chosen for analysis, as a follow‐up to previous studies; these proteins are important multidomain epigenetic regulators. In these proteins, the JmjC domain is responsible for the demethylase activity, and surrounding domains interact with histones, DNA or partner proteins. This family of enzymes was analyzed at the sequence level, in order to determine whether the sequence of JmjC‐domains was affected by the presence of a neighboring JmjN domain or PHD finger in the protein. Multiple positions within JmjC sequences were shown to have their residue distributions significantly altered by the presence of the second domain. Structural considerations confirmed the relevance of the analysis for JmjN‐JmjC proteins, while among PHD‐JmjC proteins, the length of the linker region could be correlated to the residues observed at the most affected positions. The correlation of domain architecture with residue types at certain positions, as well as that of overall architecture with protein function, is discussed. The present results thus evidence the existence of an across‐domain evolutionary stress in JmjC‐containing demethylases, and provide further insights into the overall domain architecture of JmjC domain‐containing proteins.  相似文献   

4.
Protein phosphorylation is a key mechanism to regulate protein functions. However, the contribution of this protein modification to species divergence is still largely unknown. Here, we studied the evolution of mammalian phosphoregulation by comparing the human and mouse phosphoproteomes. We found that 84% of the positions that are phosphorylated in one species or the other are conserved at the residue level. Twenty percent of these conserved sites are phosphorylated in both species. This proportion is 2.5 times more than expected by chance alone, suggesting that purifying selection is preserving phosphoregulation. However, we show that the majority of the sites that are conserved at the residue level are differentially phosphorylated between species. These sites likely result from false-negative identifications due to incomplete experimental coverage, false-positive identifications and non-functional sites. In addition, our results suggest that at least 5% of them are likely to be true differentially phosphorylated sites and may thus contribute to the divergence in phosphorylation networks between mouse and humans and this, despite residue conservation between orthologous proteins. We also showed that evolutionary turnover of phosphosites at adjacent positions (in a distance range of up to 40 amino acids) in human or mouse leads to an over estimation of the divergence in phosphoregulation between these two species. These sites tend to be phosphorylated by the same kinases, supporting the hypothesis that they are functionally redundant. Our results support the hypothesis that the evolutionary turnover of phosphorylation sites contributes to the divergence in phosphorylation profiles while preserving phosphoregulation. Overall, our study provides advanced analyses of mammalian phosphoproteomes and a framework for the study of their contribution to phenotypic evolution.  相似文献   

5.
Many protein pairs that share the same fold do not have any detectable sequence similarity, providing a valuable source of information for studying sequence-structure relationship. In this study, we use a stringent data set of structurally similar, sequence-dissimilar protein pairs to characterize residues that may play a role in the determination of protein structure and/or function. For each protein in the database, we identify amino-acid positions that show residue conservation within both close and distant family members. These positions are termed "persistently conserved". We then proceed to determine the "mutually" persistently conserved (MPC) positions: those structurally aligned positions in a protein pair that are persistently conserved in both pair mates. Because of their intra- and interfamily conservation, these positions are good candidates for determining protein fold and function. We find that 45% of the persistently conserved positions are mutually conserved. A significant fraction of them are located in critical positions for secondary structure determination, they are mostly buried, and many of them form spatial clusters within their protein structures. A substitution matrix based on the subset of MPC positions shows two distinct characteristics: (i) it is different from other available matrices, even those that are derived from structural alignments; (ii) its relative entropy is high, emphasizing the special residue restrictions imposed on these positions. Such a substitution matrix should be valuable for protein design experiments.  相似文献   

6.
Du QS  Meng JZ  Wang CH  Long SY  Huang RB 《PloS one》2011,6(12):e28206

Background

The proteins in a family, which perform the similar biological functions, may have very different amino acid composition, but they must share the similar 3D structures, and keep a stable central region. In the conservative structure region similar biological functions are performed by two or three catalytic residues with the collaboration of several functional residues at key positions. Communication signals are conducted in a position network, adjusting the biological functions in the protein family.

Methodology

A computational approach, namely structural position correlation analysis (SPCA), is developed to analyze the correlation relationship between structural segments (or positions). The basic hypothesis of SPCA is that in a protein family the structural conservation is more important than the sequence conservation, and the local structural changes may contain information of biology functional evolution. A standard protein P(0) is defined in a protein family, which consists of the most-frequent amino acids and takes the average structure of the protein family. The foundational variables of SPCA is the structural position displacements between the standard protein P(0) and individual proteins Pi of the family. The structural positions are organized as segments, which are the stable units in structural displacements of the protein family. The biological function differences of protein members are determined by the position structural displacements of individual protein Pi to the standard protein P(0). Correlation analysis is used to analyze the communication network among segments.

Conclusions

The structural position correlation analysis (SPCA) is able to find the correlation relationship among the structural segments (or positions) in a protein family, which cannot be detected by the amino acid sequence and frequency-based methods. The functional communication network among the structural segments (or positions) in protein family, revealed by SPCA approach, well illustrate the distantly allosteric interactions, and contains valuable information for protein engineering study.  相似文献   

7.
When protein sequences divergently evolve under functional constraints, some individual amino acid replacements that reverse the charge (e.g. Lys to Asp) may be compensated by a replacement at a second position that reverses the charge in the opposite direction (e.g. Glu to Arg). When these side-chains are near in space (proximal), such double replacements might be driven by natural selection, if either is selectively disadvantageous, but both together restore fully the ability of the protein to contribute to fitness (are together "neutral"). Accordingly, many have sought to identify pairs of positions in a protein sequence that suffer compensatory replacements, often as a way to identify positions near in space in the folded structure. A "charge compensatory signal" might manifest itself in two ways. First, proximal charge compensatory replacements may occur more frequently than predicted from the product of the probabilities of individual positions suffering charge reversing replacements independently. Conversely, charge compensatory pairs of changes may be observed to occur more frequently in proximal pairs of sites than in the average pair. Normally, charge compensatory covariation is detected by comparing the sequences of extant proteins at the "leaves" of phylogenetic trees. We show here that the charge compensatory signal is more evident when it is sought by examining individual branches in the tree between reconstructed ancestral sequences at nodes in the tree. Here, we find that the signal is especially strong when the positions pairs are in a single secondary structural unit (e.g. alpha helix or beta strand) that brings the side-chains suffering charge compensatory covariation near in space, and may be useful in secondary structure prediction. Also, "node-node" and "node-leaf" compensatory covariation may be useful to identify the better of two equally parsimonious trees, in a way that is independent of the mathematical formalism used to construct the tree itself. Further, compensatory covariation may provide a signal that indicates whether an episode of sequence evolution contains more or less divergence in functional behavior. Compensatory covariation analysis on reconstructed evolutionary trees may become a valuable tool to analyze genome sequences, and use these analyses to extract biomedically useful information from proteome databases.  相似文献   

8.
The presence in proteins of amino acid residues that change in concert during evolution is associated with keeping constant the protein spatial structure and functions. As in the case with morphological features, correlated substitutions may become the cause of homoplasies--the independent evolution of identical non-homological adaptations. Our data obtained on model phylogenetic trees and corresponding sets of sequences have shown that the presence of correlated substitutions distorts the results of phylogenetic reconstructions. A method for accounting for co-evolving amino acid residues in phylogenetic analysis is proposed. According to this method, only a single site from the group of correlated amino acid positions should remain, whereas other positions should not be used in further phylogenetic analysis. Simulations performed have shown that replacement on the average of 8% of variable positions in a pair of model sequences by coordinately evolving amino acid residues is able to change the tree topology. The removal of such amino acid residues from sequences before phylogenetic analysis restores the correct topology.  相似文献   

9.
Gloor GB  Martin LC  Wahl LM  Dunn SD 《Biochemistry》2005,44(19):7156-7165
Information theory was used to identify nonconserved coevolving positions in multiple sequence alignments from a variety of protein families. Coevolving positions in these alignments fall into two general categories. One set is composed of positions that coevolve with only one or two other positions. These positions often display direct amino acid side-chain interactions with their coevolving partner. The other set comprises positions that coevolve with many others and are frequently located in regions critical for protein function, such as active sites and surfaces involved in intermolecular interactions and recognition. We find that coevolving positions are more likely to change protein function when mutated than are positions showing little coevolution. These results imply that information theory may be applied generally to find coevolving, nonconserved positions that are part of functional sites in uncharacterized protein families. We propose that these coevolving positions compose an important subset of the positions in an alignment, and may be as important to the structure and function of the protein family as are highly conserved positions.  相似文献   

10.
11.
Previous reports detailing mutational effects within the hydrophobic core of human acidic fibroblast growth factor (FGF-1) have shown that a symmetric primary structure constraint is compatible with a stably folded protein. In the present report, we investigate symmetrically related pairs of buried hydrophobic residues in FGF-1 (termed "mini-cores") that are not part of the central core. The effect upon the stability and function of FGF-1 mutations designed to increase primary structure symmetry within these "mini-core" regions was evaluated. At symmetry-related positions 22, 64, and 108, the wild-type protein contains either Tyr or Phe side chains. The results show that either residue can be readily accommodated at these positions. At symmetry-related positions 42, 83, and 130, the wild-type protein contains either Cys or Ile side chains. While positions 42 and 130 can readily accommodate either Cys or Ile side chains, position 83 is substantially destabilized by substitution by Ile. Tertiary structure asymmetry in the vicinity of position 83 appears responsible for the inability to accommodate an Ile side chain at this position, and is known to contribute to functional half-life. A mutant form of FGF-1 with enforced primary structure symmetry at positions 22, 64, and 108 (all Tyr) and 42, 83, and 130 (all Cys) is shown to be more stable than the reference FGF-1 protein. The results support the hypothesis that a symmetric primary structure within a symmetric protein superfold represents a solution to achieving a foldable, stable polypeptide, and highlight the role that function may play in the evolution of asymmetry within symmetric superfolds.  相似文献   

12.
It is a central assumption of evolution that gene duplications provide the genetic raw material from which to create proteins with new functions. The increasing availability in multigene family sequences that has resulted from genome projects has inspired the creation of novel in silico approaches to predict details of protein function. The underlying principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in paralogous proteins. It has been proposed that the positions that show switches in substitution rate over time-i.e., "heterotachous sites," are good indicators of functional divergence. Here, we analyzed the alpha and beta paralogous subunits of hemoglobin in search for such signatures. We found as many heterotachous sites in comparisons between groups of paralogous subunits (alpha/beta) as between orthologous ones (alpha/alpha, beta/beta). Thus, the importance of substitution rate shifts as predictors of specialization between protein subfamilies might be reconsidered. Instead, such shifts may reflect a more general process of protein evolution, consistent with the fact that they can be compatible with function conservation. As an alternative, we focused on those residues showing highly constrained states in two sequence groups, but different in each group, and we named them CBD (for "constant but different"). As opposed to heterotachous positions, CBD sites were markedly overrepresented in paralogous (alpha/beta) comparisons, as opposed to orthologous ones (alpha/alpha, beta/beta), identifying them as likely signatures of functional specialization between the two subunits. When superimposed onto the three-dimensional structure of hemoglobin, CBD positions consistently appeared to cluster preferentially on inter-subunit surfaces, two contact areas crucial to function in vertebrate tetrameric hemoglobin. The identification and analysis of CBD sites by complementing structural information with evolutionary data may represent a promising direction for future studies dealing with the functional characterization of a growing number of multigene families identified by complete genome analyses.  相似文献   

13.
MOTIVATION: Compensating alterations during the evolution of protein families give rise to coevolving positions that contain important structural and functional information. However, a high background composed of random noise and phylogenetic components interferes with the identification of coevolving positions. RESULTS: We have developed a rapid, simple and general method based on information theory that accurately estimates the level of background mutual information for each pair of positions in a given protein family. Removal of this background results in a metric, MIp, that correctly identifies substantially more coevolving positions in protein families than any existing method. A significant fraction of these positions coevolve strongly with one or only a few positions. The vast majority of such position pairs are in contact in representative structures. The identification of strongly coevolving position pairs can be used to impose significant structural limitations and should be an important additional constraint for ab initio protein folding. AVAILABILITY: Alignments and program files can be found in the Supplementary Information.  相似文献   

14.
Summary The nematode,Caenorhabditis elegans, has a six-member gene family encoding vitellogenins, the yolk protein precursors. These genes are expressed exclusively in the intestine of the adult hermaphrodite. Here we report the cloning of all five members of the homologous gene family from anotherCaenorhabditis species,Caenorhabditis briggsae. Nucleotide sequence analysis of these genes reveals they are about 85% identical to theC. elegans genes in the coding regions. Oveerall similarity is much reduced in noncoding and flanking regions. However, two repeated heptamers, previously identified in the upstream regions of theC. elegans genes, are largely conserved in both location and sequence inC. briggsae. Conservation of certain of these heptamers suggests that proteins bound at these positions may be especially important to promoter function and/or regulation. Comparative sequence analysis also suggests the possibility that the first 70 bases of the vitellogenin mRNAs can be folded into stable secondary structures. Almost all base differences between the two species occur in sequences predicted to be unpaired, suggesting that the ability to form intrastrand base pairs has been selected duringCaenorhabditis evolution.  相似文献   

15.
Understanding the determinants of protein stability remains one of protein science's greatest challenges. There are still no computational solutions that calculate the stability effects of even point mutations with sufficient reliability for practical use. Amino acid substitutions rarely increase the stability of native proteins; hence, large libraries and high-throughput screens or selections are needed to stabilize proteins using directed evolution. Consensus mutations have proven effective for increasing stability, but these mutations are successful only about half the time. We set out to understand why some consensus mutations fail to stabilize, and what criteria might be useful to predict stabilization more accurately. Overall, consensus mutations at more conserved positions were more likely to be stabilizing in our model, triosephosphate isomerase (TIM) from Saccharomyces cerevisiae. However, positions coupled to other sites were more likely not to stabilize upon mutation. Destabilizing mutations could be removed both by removing sites with high statistical correlations to other positions and by removing nearly invariant positions at which "hidden correlations" can occur. Application of these rules resulted in identification of stabilizing mutations in 9 out of 10 positions, and amalgamation of all predicted stabilizing positions resulted in the most stable yeast TIM variant we produced (+8 °C). In contrast, a multimutant with 14 mutations each found to stabilize TIM independently was destabilized by 2 °C. Our results are a practical extension to the consensus concept of protein stabilization, and they further suggest the importance of positional independence in the mechanism of consensus stabilization.  相似文献   

16.
Abstract— Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic hypothesis while the other character set may impart information to a hypothesis. These two possibilities are cases of non-independence, however, we argue that congruence in such cases can be thought of as increasing the weight of the particular phylogenetic hypothesis that is supported by those characters. In the third case, the two sources of character information for a particular codon may be entirely incongruent with respect to phylogenetic hypotheses concerning the taxa examined. In this last case the two character sets are independent in that information from neither can predict the character states of the other. Examples of these possibilities are discussed and the general applicability of combining these two sources of information for protein coding genes is presented using sequences from the homeobox region of 46 homeobox genes fromDrosophila melanogasterto develop a hypothesis of genealogical relationship of these genes in this large multigene family.  相似文献   

17.
Abstract Protein structures are much more conserved than sequences during evolution. Based on this observation, we investigate the consequences of structural conservation on protein evolution. We study seven of the most studied protein folds, determining that an extended neutral network in sequence space is associated with each of them. Within our model, neutral evolution leads to a non-Poissonian substitution process, due to the broad distribution of connectivities in neutral networks. The observation that the substitution process has non-Poissonian statistics has been used to argue against the original Kimura neutral theory, while our model shows that this is a generic property of neutral evolution with structural conservation. Our model also predicts that the substitution rate can strongly fluctuate from one branch to another of the evolutionary tree. The average sequence similarity within a neutral network is close to the threshold of randomness, as observed for families of sequences sharing the same fold. Nevertheless, some positions are more difficult to mutate than others. We compare such structurally conserved positions to positions conserved in protein evolution, suggesting that our model can be a valuable tool to distinguish structural from functional conservation in databases of protein families. These results indicate that a synergy between database analysis and structurally based computational studies can increase our understanding of protein evolution.  相似文献   

18.
Two spurious nodes were found in phylogenetic analyses of vertebrate rhodopsin sequences in comparison with well-established vertebrate relationships. These spurious reconstructions were well supported in bootstrap analyses and occurred independently of the method of phylogenetic analysis used (parsimony, distance, or likelihood). Use of this data set of vertebrate rhodopsin sequences allowed us to exploit established vertebrate relationships, as well as the considerable amount known about the molecular evolution of this gene, in order to identify important factors contributing to the spurious reconstructions. Simulation studies using parametric bootstrapping indicate that it is unlikely that the spurious nodes in the parsimony analyses are due to long branches or other topological effects. Rather, they appear to be due to base compositional bias at third positions, codon bias, and convergent evolution at nucleotide positions encoding the hydrophobic residues isoleucine, leucine, and valine. LogDet distance methods, as well as maximum-likelihood methods which allow for nonstationary changes in base composition, reduce but do not entirely eliminate support for the spurious resolutions. Inclusion of five additional rhodopsin sequences in the phylogenetic analyses largely corrected one of the spurious reconstructions while leaving the other unaffected. The additional sequences not only were more proximal to the corrected node, but were also found to have intermediate levels of base composition and codon bias as compared with neighboring sequences on the tree. This study shows that the spurious reconstructions can be corrected either by excluding third positions, as well as those encoding the amino acids Ile, Val, and Leu (which may not be ideal, as these sites can contain useful phylogenetic signal for other parts of the tree), or by the addition of sequences that reduce problems associated with convergent evolution.  相似文献   

19.
Variability of Evolutionary Rates of DNA   总被引:6,自引:1,他引:5       下载免费PDF全文
John H. Gillespie 《Genetics》1986,113(4):1077-1091
A statistical analysis of DNA sequences from four nuclear loci and five mitochondrial loci from different orders of mammals is described. A major aim of the study is to describe the variation in the rate of molecular evolution of proteins and DNA. A measure of rate variability is the statistic R, the ratio of the variance in the number of substitutions to the mean number. For proteins, R is found to be in the range 0.16 less than R less than 35.55, thus extending in both directions the values seen in previous studies. An analysis of codons shows that there is a highly significant excess of double substitutions in the first and second positions, but not in the second and third or first and third positions. The analysis of the dynamics of nucleotide evolution showed that the ergodic Markov chain models that are the basis of most published formulas for correcting for multiple substitutions are incompatible with the data. A bootstrap procedure was used to show that the evolution of the individual nucleotides, even the third positions, show the same variation in rates as seen in the proteins. It is argued that protein and silent DNA evolution are uncoupled, with the evolution at both levels showing patterns that are better explained by the action of natural selection than by neutrality. This conclusion is based primarily on a comparison of the nuclear and mitochondrial results.  相似文献   

20.
Toll-like receptors (TLRs) are a major group of proteins that recognize molecular components of infectious agents, known as pathogen associated molecular patterns (PAMPs). The structure of these genes is similar and characterized by the presence of an ectodomain, a signal transmembrane segment and a highly conserved cytoplasmic domain. The latter domain is homologous to the human interleukin-1 receptor (IL1R) and human IL-18 receptor (IL-18R) and designated TIR domain. The latter domain of the TLR genes was suggested to be very conservative and its evolution is driven by purifying selection. Variability and evolution of the TIR sequences of TLR2 gene were studied in three hare populations from Tunisia with different ecological characteristics (NT–North Tunisia with Mediterranean, CT–Central Tunisia with semi-arid, and ST–South Tunisia with arid climate). Sequencing of a 372 bp fragment of TIR2 revealed 25 alleles among 110 hares. Twenty variable nucleotide positions were detected, of which 7 were non-synonymous. The highest variability was observed in CT, with 16 polymorphic positions. In ST, only 4 polymorphic nucleotide positions were detected with all diversity values lower than those recorded for the other two populations. By using several approaches, no positive selection was detected. However, evidence of purifying selection was found at two positions. The logistic models of the most common TIR2 protein variant that we run to examine whether its occurrence was affected by climatic variation independent of the geographic sample location suggested only a longitudinal effect. Finally, the mapping of the non-synonymous mutations to the inferred tertiary protein structure showed that they were all localized in the different loop regions. Among all non-synonymous substitutions, three were suggested to be deleterious as evidenced by PROVEAN analysis. The observed patterns of variability characterized by low genetic diversity in ST might suggest that the TIR region was more affected, than other markers, by genetic drift or/and that these patterns were shaped by different selective pressures under different ecological conditions. Notably, this low diversity was not detected by other (putatively neutral) microsatellite markers analysed in the course of other studies. But low diversity was also found for two MHC class II adaptive immune genes. As expected from functionally important regions, the evolution of the TIR2 domain is mainly driven by purifying selection. However, the occurrence of deleterious non-synonymous substitutions might highlight the flexible evolution of the TIR genes and/or their interactions with other proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号