首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Covariation between positions in a multiple sequence alignment may reflect structural, functional, and/or phylogenetic constraints and can be analyzed by a wide variety of methods. We explored several of these methods for their ability to identify covarying positions related to the divergence of a protein family at different hierarchical levels. Specifically, we compared seven methods on a model system composed of three nested sets of G‐protein‐coupled receptors (GPCRs) in which a divergence event occurred. The covariation methods analyzed were based on: χ2 test, mutual information, substitution matrices, and perturbation methods. We first analyzed the dependence of the covariation scores on residue conservation (measured by sequence entropy), and then we analyzed the networking structure of the top pairs. Two methods out of seven—OMES (Observed minus Expected Squared) and ELSC (Explicit Likelihood of Subset Covariation)—favored pairs with intermediate entropy and a networking structure with a central residue involved in several high‐scoring pairs. This networking structure was observed for the three sequence sets. In each case, the central residue corresponded to a residue known to be crucial for the evolution of the GPCR family and the subfamily specificity. These central residues can be viewed as evolutionary hubs, in relation with an epistasis‐based mechanism of functional divergence within a protein family. Proteins 2014; 82:2141–2156. © 2014 Wiley Periodicals, Inc.  相似文献   

2.
Protein–protein interactions are essential to all aspects of life. Specific interactions result from evolutionary pressure at the interacting interfaces of partner proteins. However, evolutionary pressure is not homogeneous within the interface: for instance, each residue does not contribute equally to the binding energy of the complex. To understand functional differences between residues within the interface, we analyzed their properties in the core and rim regions. Here, we characterized protein interfaces with two evolutionary measures, conservation and coevolution, using a comprehensive dataset of 896 protein complexes. These scores can detect different selection pressures at a given position in a multiple sequence alignment. We also analyzed how the number of interactions in which a residue is involved influences those evolutionary signals. We found that the coevolutionary signal is higher in the interface core than in the interface rim region. Additionally, the difference in coevolution between core and rim regions is comparable to the known difference in conservation between those regions. Considering proteins with multiple interactions, we found that conservation and coevolution increase with the number of different interfaces in which a residue is involved, suggesting that more constraints (i.e., a residue that must satisfy a greater number of interactions) allow fewer sequence changes at those positions, resulting in higher conservation and coevolution values. These findings shed light on the evolution of protein interfaces and provide information useful for identifying protein interfaces and predicting protein–protein interactions.  相似文献   

3.
The community of host species that a parasite infects is often explained by functional traits and phylogeny, predicting that closely related hosts or those with particular traits share more parasites with other hosts. Previous research has examined parasite community similarity by regressing pairwise parasite community dissimilarity between two host species against host phylogenetic distance. However, pairwise approaches cannot target specific host species responsible for disproportionate levels of parasite sharing. To better identify why some host species contribute differentially to parasite diversity patterns, we represent parasite sharing using ecological networks consisting of host species connected by instances of shared parasitism. These networks can help identify host species and traits associated with high levels of parasite sharing that may subsequently identify important hosts for parasite maintenance and transmission within communities. We used global‐scale parasite sharing networks of ungulates, carnivores, and primates to determine if host importance – encapsulated by the network measures degree, closeness, betweenness, and eigenvector centrality – was predictable based on host traits. Our findings suggest that host centrality in parasite sharing networks is a function of host population density and range size, with range size reflecting both species geographic range and the home range of those species. In the full network, host taxonomic family became an important predictor of centrality, suggesting a role for evolutionary relationships between host and parasite species. More broadly, these findings show that trait data predict key properties of ecological networks, thus highlighting a role for species traits in understanding network assembly, stability, and structure.  相似文献   

4.
The representation of protein structures as small-world networks facilitates the search for topological determinants, which may relate to functionally important residues. Here, we aimed to investigate the performance of residue centrality, viewed as a family fold characteristic, in identifying functionally important residues in protein families. Our study is based on 46 families, including 29 enzyme and 17 non-enzyme families. A total of 80% of these central positions corresponded to active site residues or residues in direct contact with these sites. For enzyme families, this percentage increased to 91%, while for non-enzyme families the percentage decreased substantially to 48%. A total of 70% of these central positions are located in catalytic sites in the enzyme families, 64% are in hetero-atom binding sites in those families binding hetero-atoms, and only 16% belong to protein-protein interfaces in families with protein-protein interaction data. These differences reflect the active site shape: enzyme active sites locate in surface clefts, hetero-atom binding residues are in deep cavities, while protein-protein interactions involve a more planar configuration. On the other hand, not all surface cavities or clefts are comprised of central residues. Thus, closeness centrality identifies functionally important residues in enzymes. While here we focus on binding sites, we expect to identify key residues for the integration and transmission of the information to the rest of the protein, reflecting the relationship between fold and function. Residue centrality is more conserved than the protein sequence, emphasizing the robustness of protein structures.  相似文献   

5.
The difficulty involved in following mandrills in the wild means that very little is known about social structure in this species. Most studies initially considered mandrill groups to be an aggregation of one-male/multifemale units, with males occupying central positions in a structure similar to those observed in the majority of baboon species. However, a recent study hypothesized that mandrills form stable groups with only two or three permanent males, and that females occupy more central positions than males within these groups. We used social network analysis methods to examine how a semi-free ranging group of 19 mandrills is structured. We recorded all dyads of individuals that were in contact as a measure of association. The betweenness and the eigenvector centrality for each individual were calculated and correlated to kinship, age and dominance. Finally, we performed a resilience analysis by simulating the removal of individuals displaying the highest betweenness and eigenvector centrality values. We found that related dyads were more frequently associated than unrelated dyads. Moreover, our results showed that the cumulative distribution of individual betweenness and eigenvector centrality followed a power function, which is characteristic of scale-free networks. This property showed that some group members, mostly females, occupied a highly central position. Finally, the resilience analysis showed that the removal of the two most central females split the network into small subgroups and increased the network diameter. Critically, this study confirms that females appear to occupy more central positions than males in mandrill groups. Consequently, these females appear to be crucial for group cohesion and probably play a pivotal role in this species.  相似文献   

6.
The functional repertoire of genes in the eukaryotic organisms is enhanced by the phenomenon of alternative splicing. Hence, a node in a tissue specific protein–protein interaction (TS PPIN) network can be thought of as an ensemble of various spliced protein products of the corresponding gene expressed in that tissue. Here we demonstrate that the nodes that occupy topologically central positions characterized by high degree, betweenness, closeness, and eigenvector centrality values in TS PPINs of Homo sapiens are associated with high number of splice variants. We also show that the high “centrality” of these genes/nodes could in part be explained by the presence of a large number of promiscuous domains.  相似文献   

7.
Coevolving residues in a multiple sequence alignment provide evolutionary clues of biophysical interactions in 3D structure. Despite a rich literature describing amino acid coevolution within or between proteins and nucleic acid coevolution within RNA, to date there has been no direct evidence of coevolution between protein and RNA. The ribosome, a structurally conserved macromolecular machine composed of over 50 interacting protein and RNA chains, provides a natural example of RNA/protein interactions that likely coevolved. We provide the first direct evidence of RNA/protein coevolution by characterizing the mutual information in residue triplets from a multiple sequence alignment of ribosomal protein L22 and neighboring 23S RNA. We define residue triplets as three positions in the multiple sequence alignment, where one position is from the 23S RNA and two positions are from the L22 protein. We show that residue triplets with high mutual information are more likely than residue doublets to be proximal in 3D space. Some high mutual information residue triplets cluster in a connected series across the L22 protein structure, similar to patterns seen in protein coevolution. We also describe RNA nucleotides for which switching from one nucleotide to another (or between purines and pyrimidines) results in a change in amino acid distribution for proximal amino acid positions. Multiple crystal structures for evolutionarily distinct ribosome species can provide structural evidence for these differences. For one residue triplet, a pyrimidine in one species is a purine in another, and RNA/protein hydrogen bonds are present in one species but not the other. The results provide the first direct evidence of RNA/protein coevolution by using higher order mutual information, suggesting that biophysical constraints on interacting RNA and protein chains are indeed a driving force in their evolution.  相似文献   

8.
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level.  相似文献   

9.
When amino acids vary during evolution, the outcome can be functionally neutral or biologically‐important. We previously found that substituting a subset of nonconserved positions, “rheostat” positions, can have surprising effects on protein function. Since changes at rheostat positions can facilitate functional evolution or cause disease, more examples are needed to understand their unique biophysical characteristics. Here, we explored whether “phylogenetic” patterns of change in multiple sequence alignments (such as positions with subfamily specific conservation) predict the locations of functional rheostat positions. To that end, we experimentally tested eight phylogenetic positions in human liver pyruvate kinase (hLPYK), using 10–15 substitutions per position and biochemical assays that yielded five functional parameters. Five positions were strongly rheostatic and three were non‐neutral. To test the corollary that positions with low phylogenetic scores were not rheostat positions, we combined these phylogenetic positions with previously‐identified hLPYK rheostat, “toggle” (most substitution abolished function), and “neutral” (all substitutions were like wild‐type) positions. Despite representing 428 variants, this set of 33 positions was poorly statistically powered. Thus, we turned to the in vivo phenotypic dataset for E. coli lactose repressor protein (LacI), which comprised 12–13 substitutions at 329 positions and could be used to identify rheostat, toggle, and neutral positions. Combined hLPYK and LacI results show that positions with strong phylogenetic patterns of change are more likely to exhibit rheostat substitution outcomes than neutral or toggle outcomes. Furthermore, phylogenetic patterns were more successful at identifying rheostat positions than were co‐evolutionary or eigenvector centrality measures of evolutionary change.  相似文献   

10.
11.
12.
Acetylcholinesterase (AChE) is an important enzyme in the nervous system. It terminates signal transmission at chemical synapses by degrading the neurotransmitter acetylcholine and was found to play a role in plaque formation in Alzheimer's disease. Several functional parts of its structure have been identified in the past. Here, we use a coarse-grained anisotropic network model approach based on structure data to analyze protein mechanics of AChE. Single contacts in the protein are "switched off" and the change in the intrinsic dynamics is measured. We correlate the gained insight with information about coevolution within the molecule derived from multiple sequence alignments. More than 300 AChE sequences were aligned and the mutual information of the positions was calculated. From these structural, biophysical, and evolutionary data we could reveal sites of coevolutionary signatures in AChE, annotate them by the selective pressure induced for biophysical reasons, and further pave the way for a more detailed understanding of evolutionary boundary conditions for AChE.  相似文献   

13.
Keunwan Park  Dongsup Kim 《Proteomics》2009,9(22):5143-5154
It has been suggested that a close relationship exists between gene essentiality and network centrality in protein–protein interaction networks. However, recent studies have reported somewhat conflicting results on this relationship. In this study, we investigated whether essential proteins could be inferred from network centrality alone. In addition, we determined which centrality measures describe the essentiality well. For this analysis, we devised new local centrality measures based on several well‐known centrality measures to more precisely describe the connection between network topology and essentiality. We examined two recent yeast protein–protein interaction networks using 40 different centrality measures. We discovered a close relationship between the path‐based localized information centrality and gene essentiality, which suggested underlying topological features that represent essentiality. We propose that two important features of the localized information centrality (proper representation of environmental complexity and the consideration of local sub‐networks) are the key factors that reveal essentiality. In addition, a random forest classifier showed reasonable performance at classifying essential proteins. Finally, the results of clustering analysis using centrality measures indicate that some network clusters are closely related with both particular biological processes and essentiality, suggesting that functionally related proteins tend to share similar network properties.  相似文献   

14.
One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more proteins whose structure or function is already known. Toward this end, we propose a means of representing proteins using pairwise sequence similarity scores. This representation, combined with a discriminative classification algorithm known as the support vector machine (SVM), provides a powerful means of detecting subtle structural and evolutionary relationships among proteins. The algorithm, called SVM-pairwise, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better performance than SVM-Fisher, profile HMMs, and PSI-BLAST.  相似文献   

15.
Correlated mutation analysis (CMA) has been used to investigate protein functional sites. However, CMA has suffered from low signal-to-noise ratio caused by meaningless phylogenetic signals or structural constraints. We present a new method, Structure-based Correlated Mutation Analysis (SCMA), which encodes coevolution scores into a protein structure network. A path-based network model is adapted to describe information transfer between residues, and the statistical significance is estimated by network shuffling. This model intrinsically assumes that residues in physical contact have a more reliable coevolution score than distant residues, and that coevolution in distant residues likely arises from a series of contacting and coevolving residues. In addition, coevolutionary coupling is statistically controlled to remove the structural effects. When applied to the rhodopsin structure, the SCMA method identified a much higher percentage of functional residues than the typical coevolution score (61% vs. 22%). In addition, statistically significant residues are used to construct the coevolved residue-residue subnetwork. The network has one highly connected node (retinal bound Lys296), indicating that Lys296 can induce and regulate most other coevolved residues in a variety of locations. The coevolved network consists of a few modular clusters which have distinct functional roles. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.  相似文献   

16.
Small protein fragments, and not just residues, can be used as basic building blocks to reconstruct networks of coevolved amino acids in proteins. Fragments often enter in physical contact one with the other and play a major biological role in the protein. The nature of these interactions might be multiple and spans beyond binding specificity, allosteric regulation and folding constraints. Indeed, coevolving fragments are indicators of important information explaining folding intermediates, peptide assembly, key mutations with known roles in genetic diseases, distinguished subfamily-dependent motifs and differentiated evolutionary pressures on protein regions. Coevolution analysis detects networks of fragments interaction and highlights a high order organization of fragments demonstrating the importance of studying at a deeper level this structure. We demonstrate that it can be applied to protein families that are highly conserved or represented by few sequences, enlarging in this manner, the class of proteins where coevolution analysis can be performed and making large-scale coevolution studies a feasible goal.  相似文献   

17.
The CHAIN program: forging evolutionary links to underlying mechanisms   总被引:1,自引:0,他引:1  
Proteins evolve new functions by modifying and extending the molecular machinery of an ancestral protein. Such changes show up as divergent sequence patterns, which are conserved in descendent proteins that maintain the divergent function. After multiply-aligning a set of input sequences, the CHAIN program partitions the sequences into two functionally divergent groups and then outputs an alignment that is annotated to reveal the selective pressures imposed on divergent residue positions. If atomic coordinates are also provided, hydrogen bonds and other atomic interactions associated with various categories of divergent residues are graphically displayed. Such analyses establish links between protein evolutionary divergence and functionally crucial atomic features and, as a result, can suggest plausible molecular mechanisms for experimental testing. This is illustrated here by its application to bacterial clamp-loader ATPases.  相似文献   

18.

Background

While the conserved positions of a multiple sequence alignment (MSA) are clearly of interest, non-conserved positions can also be important because, for example, destabilizing effects at one position can be compensated by stabilizing effects at another position. Different methods have been developed to recognize the evolutionary relationship between amino acid sites, and to disentangle functional/structural dependencies from historical/phylogenetic ones.

Methodology/Principal Findings

We have used two complementary approaches to test the efficacy of these methods. In the first approach, we have used a new program, MSAvolve, for the in silico evolution of MSAs, which records a detailed history of all covarying positions, and builds a global coevolution matrix as the accumulated sum of individual matrices for the positions forced to co-vary, the recombinant coevolution, and the stochastic coevolution. We have simulated over 1600 MSAs for 8 protein families, which reflect sequences of different sizes and proteins with widely different functions. The calculated coevolution matrices were compared with the coevolution matrices obtained for the same evolved MSAs with different coevolution detection methods. In a second approach we have evaluated the capacity of the different methods to predict close contacts in the representative X-ray structures of an additional 150 protein families using only experimental MSAs.

Conclusions/Significance

Methods based on the identification of global correlations between pairs were found to be generally superior to methods based only on local correlations in their capacity to identify coevolving residues using either simulated or experimental MSAs. However, the significant variability in the performance of different methods with different proteins suggests that the simulation of MSAs that replicate the statistical properties of the experimental MSA can be a valuable tool to identify the coevolution detection method that is most effective in each case.  相似文献   

19.
Kinch LN  Grishin NV 《Proteins》2002,48(1):75-84
Nitrogen regulatory (PII) proteins are signal transduction molecules involved in controlling nitrogen metabolism in prokaryots. PII proteins integrate the signals of intracellular nitrogen and carbon status into the control of enzymes involved in nitrogen assimilation. Using elaborate sequence similarity detection schemes, we show that five clusters of orthologs (COGs) and several small divergent protein groups belong to the PII superfamily and predict their structure to be a (betaalphabeta)(2) ferredoxin-like fold. Proteins from the newly emerged PII superfamily are present in all major phylogenetic lineages. The PII homologs are quite diverse, with below random (as low as 1%) pairwise sequence identities between some members of distant groups. Despite this sequence diversity, evidence suggests that the different subfamilies retain the PII trimeric structure important for ligand-binding site formation and maintain a conservation of conservations at residue positions important for PII function. Because most of the orthologous groups within the PII superfamily are composed entirely of hypothetical proteins, our remote homology-based structure prediction provides the only information about them. Analogous to structural genomics efforts, such prediction gives clues to the biological roles of these proteins and allows us to hypothesize about locations of functional sites on model structures or rationalize about available experimental information. For instance, conserved residues in one of the families map in close proximity to each other on PII structure, allowing for a possible metal-binding site in the proteins coded by the locus known to affect sensitivity to divalent metal ions. Presented analysis pushes the limits of sequence similarity searches and exemplifies one of the extreme cases of reliable sequence-based structure prediction. In conjunction with structural genomics efforts to shed light on protein function, our strategies make it possible to detect homology between highly diverse sequences and are aimed at understanding the most remote evolutionary connections in the protein world.  相似文献   

20.
Coevolution has long been thought to drive the exaggeration of traits, promote major evolutionary transitions such as the evolution of sexual reproduction and influence epidemiological dynamics. Despite coevolution’s long suspected importance, we have yet to develop a quantitative understanding of its strength and prevalence because we lack generally applicable statistical methods that yield numerical estimates for coevolution’s strength and significance in the wild. Here, we develop a novel method that derives maximum likelihood estimates for the strength of direct pairwise coevolution by coupling a well‐established coevolutionary model to spatially structured phenotypic data. Applying our method to two well‐studied interactions reveals evidence for coevolution in both systems. Broad application of this approach has the potential to further resolve long‐standing evolutionary debates such as the role species interactions play in the evolution of sexual reproduction and the organisation of ecological communities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号