首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.  相似文献   

2.
The evolution of circulating viruses is shaped by their need to evade antibody response, which mainly targets the viral spike. Because of the high density of spikes on the viral surface, not all antigenic sites are targeted equally by antibodies. We offer here a geometry-based approach to predict and rank the probability of surface residues of SARS spike (S protein) and influenza H1N1 spike (hemagglutinin) to acquire antibody-escaping mutations utilizing in-silico models of viral structure. We used coarse-grained MD simulations to estimate the on-rate (targeting) of an antibody model to surface residues of the spike protein. Analyzing publicly available sequences, we found that spike surface sequence diversity of the pre-pandemic seasonal influenza H1N1 and the sarbecovirus subgenus highly correlates with our model prediction of antibody targeting. In particular, we identified an antibody-targeting gradient, which matches a mutability gradient along the main axis of the spike. This identifies the role of viral surface geometry in shaping the evolution of circulating viruses. For the 2009 H1N1 and SARS-CoV-2 pandemics, a mutability gradient along the main axis of the spike was not observed. Our model further allowed us to identify key residues of the SARS-CoV-2 spike at which antibody escape mutations have now occurred. Therefore, it can inform of the likely functional role of observed mutations and predict at which residues antibody-escaping mutation might arise.  相似文献   

3.
Traditional modes of investigating influenza nosocomial transmission have entailed a combination of confirmatory molecular diagnostic testing and epidemiological investigation. Common hospital-acquired infections like influenza require a discerning ability to distinguish between viral isolates to accurately identify patient transmission chains. We assessed whether influenza hemagglutinin sequence phylogenies can be used to enrich epidemiological data when investigating the extent of nosocomial transmission over a four-month period within a paediatric Hospital in Cape Town South Africa. Possible transmission chains/channels were initially determined through basic patient admission data combined with Maximum likelihood and time-scaled Bayesian phylogenetic analyses. These analyses suggested that most instances of potential hospital-acquired infections resulted from multiple introductions of Influenza A into the hospital, which included instances where virus hemagglutinin sequences were identical between different patients. Furthermore, a general inability to establish epidemiological transmission linkage of patients/viral isolates implied that identified isolates could have originated from asymptomatic hospital patients, visitors or hospital staff. In contrast, a traditional epidemiological investigation that used no viral phylogenetic analyses, based on patient co-admission into specific wards during a particular time-frame, suggested that multiple hospital acquired infection instances may have stemmed from a limited number of identifiable index viral isolates/patients. This traditional epidemiological analysis by itself could incorrectly suggest linkage between unrelated cases, underestimate the number of unique infections and may overlook the possible diffuse nature of hospital transmission, which was suggested by sequencing data to be caused by multiple unique introductions of influenza A isolates into individual hospital wards. We have demonstrated a functional role for viral sequence data in nosocomial transmission investigation through its ability to enrich traditional, non-molecular observational epidemiological investigation by teasing out possible transmission pathways and working toward more accurately enumerating the number of possible transmission events.  相似文献   

4.
Conserved protein sequence segments are commonly believed to correspond to functional sites in the protein sequence. A novel approach is proposed to profile the changing degree of conservation along the protein sequence, by evaluating the occurrence frequencies of all short oligopeptides of the given sequence in a large proteome database. Thus, a protein sequence conservation profile can be plotted for every protein. The profile indicates where along the sequences the potential functional (conserved) sites are located. The corresponding oligopeptides belonging to the sites are very frequent across many prokaryotic species. Analysis of a representative set of such profiles reveals a common feature of all examined proteins: they consist of sequence modules represented by the peaks of conservation. Typical size of the modules (peak-to-peak distance) is 25-30 amino acid residues.  相似文献   

5.
6.
The identification of functionally important residues is an important challenge for understanding the molecular mechanisms of proteins. Membrane protein transporters operate two-state allosteric conformational changes using functionally important cooperative residues that mediate long-range communication from the substrate binding site to the translocation pathway. In this study, we identified functionally important cooperative residues of membrane protein transporters by integrating sequence conservation and co-evolutionary information. A newly derived evolutionary feature, the co-evolutionary coupling number, was introduced to measure the connectivity of co-evolving residue pairs and was integrated with the sequence conservation score. We tested this method on three Major Facilitator Superfamily (MFS) transporters, LacY, GlpT, and EmrD. MFS transporters are an important family of membrane protein transporters, which utilize diverse substrates, catalyze different modes of transport using unique combinations of functional residues, and have enough characterized functional residues to validate the performance of our method. We found that the conserved cores of evolutionarily coupled residues are involved in specific substrate recognition and translocation of MFS transporters. Furthermore, a subset of the residues forms an interaction network connecting functional sites in the protein structure. We also confirmed that our method is effective on other membrane protein transporters. Our results provide insight into the location of functional residues important for the molecular mechanisms of membrane protein transporters.  相似文献   

7.
Abstract

Conserved protein sequence segments are commonly believed to correspond to functional sites in the protein sequence. A novel approach is proposed to profile the changing degree of conservation along the protein sequence, by evaluating the occurrence frequencies of all short oligopeptides of the given sequence in a large proteome database. Thus, a protein sequence conservation profile can be plotted for every protein. The profile indicates where along the sequences the potential functional (conserved) sites are located. The corresponding oligopeptides belonging to the sites are very frequent across many prokaryotic species. Analysis of a representative set of such profiles reveals a common feature of all examined proteins: they consist of sequence modules represented by the peaks of conservation. Typical size of the modules (peak-to-peak distance) is 25–30 amino acid residues.  相似文献   

8.
Evolutionary arms races between pathogens and their hosts may be manifested as selection for rapid evolutionary change of key genes, and are sometimes detectable through sequence-level analyses. In the case of protein-coding genes, such analyses frequently predict that specific codons are under positive selection. However, detecting positive selection can be non-trivial, and false positive predictions are a common concern in such analyses. It is therefore helpful to place such predictions within a structural and functional context. Here, we focus on the p19 protein from tombusviruses. P19 is a homodimer that sequesters siRNAs, thereby preventing the host RNAi machinery from shutting down viral infection. Sequence analysis of the p19 gene is complicated by the fact that it is constrained at the sequence level by overprinting of a viral movement protein gene. Using homology modeling, in silico mutation and molecular dynamics simulations, we assess how non-synonymous changes to two residues involved in forming the dimer interface—one invariant, and one predicted to be under positive selection—impact molecular function. Interestingly, we find that both observed variation and potential variation (where a non-synonymous change to p19 would be synonymous for the overprinted movement protein) does not significantly impact protein structure or RNA binding. Consequently, while several methods identify residues at the dimer interface as being under positive selection, MD results suggest they are functionally indistinguishable from a site that is free to vary. Our analyses serve as a caveat to using sequence-level analyses in isolation to detect and assess positive selection, and emphasize the importance of also accounting for how non-synonymous changes impact structure and function.  相似文献   

9.
The binding between an enzyme and its substrate is highly specific, despite the fact that many different enzymes show significant sequence and structure similarity. There must be, then, substrate specificity-determining residues that enable different enzymes to recognize their unique substrates. We reason that a coordinated, not independent, action of both conserved and non-conserved residues determine enzymatic activity and specificity. Here, we present a surface patch ranking (SPR) method for in silico discovery of substrate specificity-determining residue clusters by exploring both sequence conservation and correlated mutations. As case studies we apply SPR to several highly homologous enzymatic protein pairs, such as guanylyl versus adenylyl cyclases, lactate versus malate dehydrogenases, and trypsin versus chymotrypsin. Without using experimental data, we predict several single and multi-residue clusters that are consistent with previous mutagenesis experimental results. Most single-residue clusters are directly involved in enzyme-substrate interactions, whereas multi-residue clusters are vital for domain-domain and regulator-enzyme interactions, indicating their complementary role in specificity determination. These results demonstrate that SPR may help the selection of target residues for mutagenesis experiments and, thus, focus rational drug design, protein engineering, and functional annotation to the relevant regions of a protein.  相似文献   

10.
11.
Tripartite motif protein 22 (TRIM22) is an evolutionarily ancient protein that plays an integral role in the host innate immune response to viruses. The antiviral TRIM22 protein has been shown to inhibit the replication of a number of viruses, including HIV-1, hepatitis B, and influenza A. TRIM22 expression has also been associated with multiple sclerosis, cancer, and autoimmune disease. In this study, multiple in silico computational methods were used to identify non-synonymous or amino acid-changing SNPs (nsSNP) that are deleterious to TRIM22 structure and/or function. A sequence homology-based approach was adopted for screening nsSNPs in TRIM22, including six different in silico prediction algorithms and evolutionary conservation data from the ConSurf web server. In total, 14 high-risk nsSNPs were identified in TRIM22, most of which are located in a protein interaction module called the B30.2 domain. Additionally, 9 of the top high-risk nsSNPs altered the putative structure of TRIM22''s B30.2 domain, particularly in the surface-exposed v2 and v3 regions. These same regions are critical for retroviral restriction by the closely-related TRIM5α protein. A number of putative structural and functional residues, including several sites that undergo post-translational modification, were also identified in TRIM22. This study is the first extensive in silico analysis of the highly polymorphic TRIM22 gene and will be a valuable resource for future targeted mechanistic and population-based studies.  相似文献   

12.

Background

There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses.

Methodology/Principal Findings

We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature.

Conclusions/Significance

Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.  相似文献   

13.
14.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

15.
The H1N1 subtype of influenza A virus has caused two of the four documented pandemics and is responsible for seasonal epidemic outbreaks, presenting a continuous threat to public health. Co-circulating antigenically divergent influenza strains significantly complicates vaccine development and use. Here, by combining evolutionary, structural, functional, and population information about the H1N1 proteome, we seek to answer two questions: (1) do residues on the protein surfaces evolve faster than the protein core residues consistently across all proteins that constitute the influenza proteome? and (2) in spite of the rapid evolution of surface residues in influenza proteins, are there any protein regions on the protein surface that do not evolve? To answer these questions, we first built phylogenetically-aware models of the patterns of surface and interior substitutions. Employing these models, we found a single coherent pattern of faster evolution on the protein surfaces that characterizes all influenza proteins. The pattern is consistent with the events of inter-species reassortment, the worldwide introduction of the flu vaccine in the early 80’s, as well as the differences caused by the geographic origins of the virus. Next, we developed an automated computational pipeline to comprehensively detect regions of the protein surface residues that were 100% conserved over multiple years and in multiple host species. We identified conserved regions on the surface of 10 influenza proteins spread across all avian, swine, and human strains; with the exception of a small group of isolated strains that affected the conservation of three proteins. Surprisingly, these regions were also unaffected by genetic variation in the pandemic 2009 H1N1 viral population data obtained from deep sequencing experiments. Finally, the conserved regions were intrinsically related to the intra-viral macromolecular interaction interfaces. Our study may provide further insights towards the identification of novel protein targets for influenza antivirals.  相似文献   

16.
Arenaviruses are negative-strand RNA viruses that cause human diseases such as lymphocytic choriomeningitis, Bolivian hemorrhagic fever, and Lassa hemorrhagic fever. No licensed vaccines exist, and current treatment is limited to ribavirin. The prototypic arenavirus, lymphocytic choriomeningitis virus (LCMV), is a model for dissecting virus-host interactions in persistent and acute disease. The RING finger protein Z has been identified as the driving force of arenaviral budding and acts as the viral matrix protein. While residues in Z required for viral budding have been described, residues that govern the Z matrix function(s) have yet to be fully elucidated. Because this matrix function is integral to viral assembly, we reasoned that this would be reflected in sequence conservation. Using sequence alignment, we identified several conserved residues in Z outside the RING and late domains. Nine residues were each mutated to alanine in Lassa fever virus Z. All of the mutations affected the expression of an LCMV minigenome and the infectivity of virus-like particles, but to greatly varying degrees. Interestingly, no mutations appeared to affect Z-mediated budding or association with viral GP. Our findings provide direct experimental evidence supporting a role for Z in the modulation of the activity of the viral ribonucleoprotein (RNP) complex and its packaging into mature infectious viral particles.  相似文献   

17.
Viruses can exploit a variety of strategies to evade immune surveillance by cytotoxic T lymphocytes (CTL), including the acquisition of mutations in CTL epitopes. Also for influenza A viruses a number of amino acid substitutions in the nucleoprotein (NP) have been associated with escape from CTL. However, other previously identified influenza A virus CTL epitopes are highly conserved, including the immunodominant HLA-A*0201-restricted epitope from the matrix protein, M1(58-66). We hypothesized that functional constraints were responsible for the conserved nature of influenza A virus CTL epitopes, limiting escape from CTL. To assess the impact of amino acid substitutions in conserved epitopes on viral fitness and recognition by specific CTL, we performed a mutational analysis of CTL epitopes. Both alanine replacements and more conservative substitutions were introduced at various positions of different influenza A virus CTL epitopes. Alanine replacements for each of the nine amino acids of the M1(58-66) epitope were tolerated to various extents, except for the anchor residue at the second position. Substitution of anchor residues in other influenza A virus CTL epitopes also affected viral fitness. Viable mutant viruses were used in CTL recognition experiments. The results are discussed in the light of the possibility of influenza viruses to escape from specific CTL. It was speculated that functional constraints limit variation in certain epitopes, especially at anchor residues, explaining the conserved nature of these epitopes.  相似文献   

18.
Greene LH  Hamada D  Eyles SJ  Brew K 《FEBS letters》2003,553(1-2):39-44
We systematically identify a group of evolutionarily conserved residues proposed for folding in a model beta-barrel superfamily, the lipocalins. The nature of conservation at the structural level is defined and we show that the conserved residues are involved in a network of interactions that form the core of the fold. Exploratory kinetic studies are conducted with a model superfamily member, human serum retinol-binding protein, to examine their role. The present results, coupled with key experimental studies conducted with another lipocalin beta-lactoglobulin, suggest that the evolutionarily conserved regions fold on a faster folding time-scale than the non-conserved regions.  相似文献   

19.
The ability to identify the functional correlates of structural and sequence variation in proteins is a critical capability. We related structures of influenza A N10 and N11 proteins that have no established function to structures of proteins with known function by identifying spatially conserved atoms. We identified atoms with common distributed spatial occupancy in PDB structures of N10 protein, N11 protein, an influenza A neuraminidase, an influenza B neuraminidase, and a bacterial neuraminidase. By superposing these spatially conserved atoms, we aligned the structures and associated molecules. We report spatially and sequence invariant residues in the aligned structures. Spatially invariant residues in the N6 and influenza B neuraminidase active sites were found in previously unidentified spatially equivalent sites in the N10 and N11 proteins. We found the corresponding secondary and tertiary structures of the aligned proteins to be largely identical despite significant sequence divergence. We found structural precedent in known non-neuraminidase structures for residues exhibiting structural and sequence divergence in the aligned structures. In N10 protein, we identified staphylococcal enterotoxin I-like domains. In N11 protein, we identified hepatitis E E2S-like domains, SARS spike protein-like domains, and toxin components shared by alpha-bungarotoxin, staphylococcal enterotoxin I, anthrax lethal factor, clostridium botulinum neurotoxin, and clostridium tetanus toxin. The presence of active site components common to the N6, influenza B, and S. pneumoniae neuraminidases in the N10 and N11 proteins, combined with the absence of apparent neuraminidase function, suggests that the role of neuraminidases in H17N10 and H18N11 emerging influenza A viruses may have changed. The presentation of E2S-like, SARS spike protein-like, or toxin-like domains by the N10 and N11 proteins in these emerging viruses may indicate that H17N10 and H18N11 sialidase-facilitated cell entry has been supplemented or replaced by sialidase-independent receptor binding to an expanded cell population that may include neurons and T-cells.  相似文献   

20.
Statistical analyses of genome sequence‐derived protein sequence data can identify amino acid residues that interact between proteins or between domains of a protein. These statistical methods are based on evolution‐directed amino acid variation responding to structural and functional constraints in proteins. The identified residues form a basis for determining structure and folding of proteins as well as inferring mechanisms of protein function. When applied to two‐component systems, several research groups have shown they can be used to identify the amino acid interactions between response regulators and histidine kinases and the specificity therein. Recently, statistical studies between the HisKA and HATPase‐ATP‐binding domains of histidine kinases identified amino acid interactions for both the inactive and the active catalytic states of such kinases. The identified interactions generated a model structure for the domain conformation of the active state. This conformation requires an unwinding of a portion of the C‐terminal helix of the HisKA domain that destroys the inactive state residue contacts and suggests how signal‐binding determines the equilibrium between the inactive and active states of histidine kinases. The rapidly accumulating protein sequence databases from genome, metagenome and microbiome studies are an important resource for functional and structural understanding of proteins and protein complexes in microbes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号