首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The relationship between Scleractinia and Corallimorpharia, Orders within Anthozoa distinguished by the presence of an aragonite skeleton in the former, is controversial. Although classically considered distinct groups, some phylogenetic analyses have placed the Corallimorpharia within a larger Scleractinia/Corallimorpharia clade, leading to the suggestion that the Corallimorpharia are “naked corals” that arose via skeleton loss during the Cretaceous from a Scleractinian ancestor. Scleractinian paraphyly is, however, contradicted by a number of recent phylogenetic studies based on mt nucleotide (nt) sequence data. Whereas the “naked coral” hypothesis was based on analysis of the sequences of proteins encoded by a relatively small number of mt genomes, here a much-expanded dataset was used to reinvestigate hexacorallian phylogeny. The initial observation was that, whereas analyses based on nt data support scleractinian monophyly, those based on amino acid (aa) data support the “naked coral” hypothesis, irrespective of the method and with very strong support. To better understand the bases of these contrasting results, the effects of systematic errors were examined. Compared to other hexacorallians, the mt genomes of “Robust” corals have a higher (A+T) content, codon usage is far more constrained, and the proteins that they encode have a markedly higher phenylalanine content, leading us to suggest that mt DNA repair may be impaired in this lineage. Thus the “naked coral” topology could be caused by high levels of saturation in these mitochondrial sequences, long-branch effects or model violations. The equivocal results of these extensive analyses highlight the fundamental problems of basing coral phylogeny on mitochondrial sequence data.  相似文献   

2.
3.
The millions of protein sequences generated by genomics are expected to transform protein engineering and personalized medicine. To achieve these goals, tools for predicting outcomes of amino acid changes must be improved. Currently, advances are hampered by insufficient experimental data about nonconserved amino acid positions. Since the property “nonconserved” is identified using a sequence alignment, we designed experiments to recapitulate that context: Mutagenesis and functional characterization was carried out in 15 LacI/GalR homologs (rows) at 12 nonconserved positions (columns). Multiple substitutions were made at each position, to reveal how various amino acids of a nonconserved column were tolerated in each protein row. Results showed that amino acid preferences of nonconserved positions were highly context-dependent, had few correlations with physico-chemical similarities, and were not predictable from their occurrence in natural LacI/GalR sequences. Further, unlike the “toggle switch” behaviors of conserved positions, substitutions at nonconserved positions could be rank-ordered to show a “rheostatic”, progressive effect on function that spanned several orders of magnitude. Comparisons to various sequence analyses suggested that conserved and strongly co-evolving positions act as functional toggles, whereas other important, nonconserved positions serve as rheostats for modifying protein function. Both the presence of rheostat positions and the sequence analysis strategy appear to be generalizable to other protein families and should be considered when engineering protein modifications or predicting the impact of protein polymorphisms.  相似文献   

4.
The rapid mutation of human immunodeficiency virus-type 1 (HIV-1) and the limited characterization of the composition and incidence of the variant population are major obstacles to the development of an effective HIV-1 vaccine. This issue was addressed by a comprehensive analysis of over 58,000 clade B HIV-1 protein sequences reported over at least 26 years. The sequences were aligned and the 2,874 overlapping nonamer amino acid positions of the viral proteome, each a possible core binding domain for human leukocyte antigen molecules and T-cell receptors, were quantitatively analyzed for four patterns of sequence motifs: (1) “index”, the most prevalent sequence; (2) “major” variant, the most common variant sequence; (3) “minor” variants, multiple different sequences, each with an incidence less than that of the major variant; and (4) “unique” variants, each observed only once in the alignment. The collective incidence of the major, minor, and unique variants at each nonamer position represented the total variant population for the position. Positions with more than 50% total variants contained correspondingly reduced incidences of index and major variant sequences and increased minor and unique variants. Highly diverse positions, with 80 to 98% variant nonamer sequences, were present in each protein, including 5% of Gag, and 27% of Env and Nef, each. The multitude of different variant nonamer sequences (i.e. nonatypes; up to 68%) at the highly diverse positions, represented by the major, multiple minor, and multiple unique variants likely supported variants function both in immune escape and as altered peptide ligands with deleterious T-cell responses. The patterns of mutational change were consistent with the sequences of individual HXB2 and C1P viruses and can be considered applicable to all HIV-1 viruses. This characterization of HIV-1 protein mutation provides a foundation for the design of peptide-based vaccines and therapeutics.  相似文献   

5.
Intrinsically disordered regions have been associated with various cellular processes and are implicated in several human diseases, but their exact roles remain unclear. We previously defined two classes of conserved disordered regions in budding yeast, referred to as “flexible” and “constrained” conserved disorder. In flexible disorder, the property of disorder has been positionally conserved during evolution, whereas in constrained disorder, both the amino acid sequence and the property of disorder have been conserved. Here, we show that flexible and constrained disorder are widespread in the human proteome, and are particularly common in proteins with regulatory functions. Both classes of disordered sequences are highly enriched in regions of proteins that undergo tissue-specific (TS) alternative splicing (AS), but not in regions of proteins that undergo general (i.e., not tissue-regulated) AS. Flexible disorder is more highly enriched in TS alternative exons, whereas constrained disorder is more highly enriched in exons that flank TS alternative exons. These latter regions are also significantly more enriched in potential phosphosites and other short linear motifs associated with cell signaling. We further show that cancer driver mutations are significantly enriched in regions of proteins associated with TS and general AS. Collectively, our results point to distinct roles for TS alternative exons and flanking exons in the dynamic regulation of protein interaction networks in response to signaling activity, and they further suggest that alternatively spliced regions of proteins are often functionally altered by mutations responsible for cancer.  相似文献   

6.
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation.  相似文献   

7.
Bacteriophages are the most abundant forms of life in the biosphere and carry genomes characterized by high genetic diversity and mosaic architectures. The complete sequences of 30 mycobacteriophage genomes show them collectively to encode 101 tRNAs, three tmRNAs, and 3,357 proteins belonging to 1,536 “phamilies” of related sequences, and a statistical analysis predicts that these represent approximately 50% of the total number of phamilies in the mycobacteriophage population. These phamilies contain 2.19 proteins on average; more than half (774) of them contain just a single protein sequence. Only six phamilies have representatives in more than half of the 30 genomes, and only three—encoding tape-measure proteins, lysins, and minor tail proteins—are present in all 30 phages, although these phamilies are themselves highly modular, such that no single amino acid sequence element is present in all 30 mycobacteriophage genomes. Of the 1,536 phamilies, only 230 (15%) have amino acid sequence similarity to previously reported proteins, reflecting the enormous genetic diversity of the entire phage population. The abundance and diversity of phages, the simplicity of phage isolation, and the relatively small size of phage genomes support bacteriophage isolation and comparative genomic analysis as a highly suitable platform for discovery-based education.  相似文献   

8.
Interactions in protein networks may place constraints on protein interface sequences to maintain correct and avoid unwanted interactions. Here we describe a “multi-constraint” protein design protocol to predict sequences optimized for multiple criteria, such as maintaining sets of interactions, and apply it to characterize the mechanism and extent to which 20 multi-specific proteins are constrained by binding to multiple partners. We find that multi-specific binding is accommodated by at least two distinct patterns. In the simplest case, all partners share key interactions, and sequences optimized for binding to either single or multiple partners recover only a subset of native amino acid residues as optimal. More interestingly, for signaling interfaces functioning as network “hubs,” we identify a different, “multi-faceted” mode, where each binding partner prefers its own subset of wild-type residues within the promiscuous binding site. Here, integration of preferences across all partners results in sequences much more “native-like” than seen in optimization for any single binding partner alone, suggesting these interfaces are substantially optimized for multi-specificity. The two strategies make distinct predictions for interface evolution and design. Shared interfaces may be better small molecule targets, whereas multi-faceted interactions may be more “designable” for altered specificity patterns. The computational methodology presented here is generalizable for examining how naturally occurring protein sequences have been selected to satisfy a variety of positive and negative constraints, as well as for rationally designing proteins to have desired patterns of altered specificity.  相似文献   

9.
Interactions between small molecules and proteins play critical roles in regulating and facilitating diverse biological functions, yet our ability to accurately re-engineer the specificity of these interactions using computational approaches has been limited. One main difficulty, in addition to inaccuracies in energy functions, is the exquisite sensitivity of protein–ligand interactions to subtle conformational changes, coupled with the computational problem of sampling the large conformational search space of degrees of freedom of ligands, amino acid side chains, and the protein backbone. Here, we describe two benchmarks for evaluating the accuracy of computational approaches for re-engineering protein-ligand interactions: (i) prediction of enzyme specificity altering mutations and (ii) prediction of sequence tolerance in ligand binding sites. After finding that current state-of-the-art “fixed backbone” design methods perform poorly on these tests, we develop a new “coupled moves” design method in the program Rosetta that couples changes to protein sequence with alterations in both protein side-chain and protein backbone conformations, and allows for changes in ligand rigid-body and torsion degrees of freedom. We show significantly increased accuracy in both predicting ligand specificity altering mutations and binding site sequences. These methodological improvements should be useful for many applications of protein – ligand design. The approach also provides insights into the role of subtle conformational adjustments that enable functional changes not only in engineering applications but also in natural protein evolution.  相似文献   

10.
By applying analysis of the principal components of amino acid physical properties we predicted cathepsin cleavage sites, MHC binding affinity, and probability of B-cell epitope binding of peptides in tetanus toxin and in ten diverse additional proteins. Cross-correlation of these metrics, for peptides of all possible amino acid index positions, each evaluated in the context of a ±25 amino acid flanking region, indicated that there is a strongly repetitive pattern of short peptides of approximately thirty amino acids each bounded by cathepsin cleavage sites and each comprising B-cell linear epitopes, MHC–I and MHC-II binding peptides. Such “immunologic kernel” peptides comprise all signals necessary for adaptive immunologic cognition, response and recall. The patterns described indicate a higher order spatial integration that forms a symbolic logic coordinating the adaptive immune system.  相似文献   

11.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.  相似文献   

12.
Y-linked single-nucleotide polymorphisms (SNPs) have served as powerful tools for reconstructing the worldwide genealogy of human Y chromosomes and for illuminating patrilineal relationships among modern human populations. However, there has been no systematic, worldwide survey of sequence variation within the protein-coding genes of the Y chromosome. Here we report and analyze coding sequence variation among the 16 single-copy “X-degenerate” genes of the Y chromosome. We examined variation in these genes in 105 men representing worldwide diversity, resequencing in each man an average of 27 kb of coding DNA, 40 kb of intronic DNA, and, for comparison, 15 kb of DNA in single-copy Y-chromosomal pseudogenes. There is remarkably little variation in X-degenerate protein sequences: two chromosomes drawn at random differ on average by a single amino acid, with half of these differences arising from a single, conservative Asp→Glu mutation that occurred ∼50,000 years ago. Further analysis showed that nucleotide diversity and the proportion of variant sites are significantly lower for nonsynonymous sites than for synonymous sites, introns, or pseudogenes. These differences imply that natural selection has operated effectively in preserving the amino acid sequences of the Y chromosome''s X-degenerate proteins during the last ∼100,000 years of human history. Thus our findings are at odds with prominent accounts of the human Y chromosome''s imminent demise.  相似文献   

13.
The fidelity of the folding pathways being encoded in the amino acid sequence is met with challenge in instances where proteins with no sequence homology, performing different functions and no apparent evolutionary linkage, adopt a similar fold. The problem stated otherwise is that a limited fold space is available to a repertoire of diverse sequences. The key question is what factors lead to the formation of a fold from diverse sequences. Here, with the NAD(P)-binding Rossmann fold domains as a case study and using the concepts of network theory, we have unveiled the consensus structural features that drive the formation of this fold. We have proposed a graph theoretic formalism to capture the structural details in terms of the conserved atomic interactions in global milieu, and hence extract the essential topological features from diverse sequences. A unified mathematical representation of the different structures together with a judicious concoction of several network parameters enabled us to probe into the structural features driving the adoption of the NAD(P)-binding Rossmann fold. The atomic interactions at key positions seem to be better conserved in proteins, as compared to the residues participating in these interactions. We propose a “spatial motif” and several “fold specific hot spots” that form the signature structural blueprints of the NAD(P)-binding Rossmann fold domain. Excellent agreement of our data with previous experimental and theoretical studies validates the robustness and validity of the approach. Additionally, comparison of our results with statistical coupling analysis (SCA) provides further support. The methodology proposed here is general and can be applied to similar problems of interest.  相似文献   

14.
ISCR Elements: Novel Gene-Capturing Systems of the 21st Century?   总被引:9,自引:0,他引:9       下载免费PDF全文
“Common regions” (CRs), such as Orf513, are being increasingly linked to mega-antibiotic-resistant regions. While their overall nucleotide sequences show little identity to other mobile elements, amino acid alignments indicate that they possess the key motifs of IS91-like elements, which have been linked to the mobility ent plasmids in pathogenic Escherichia coli. Further inspection reveals that they possess an IS91-like origin of replication and termination sites (terIS), and therefore CRs probably transpose via a rolling-circle replication mechanism. Accordingly, in this review we have renamed CRs as ISCRs to give a more accurate reflection of their functional properties. The genetic context surrounding ISCRs indicates that they can procure 5′ sequences via misreading of the cognate terIS, i.e., “unchecked transposition.” Clinically, the most worrying aspect of ISCRs is that they are increasingly being linked with more potent examples of resistance, i.e., metallo-β-lactamases in Pseudomonas aeruginosa and co-trimoxazole resistance in Stenotrophomonas maltophilia. Furthermore, if ISCR elements do move via “unchecked RC transposition,” as has been speculated for ISCR1, then this mechanism provides antibiotic resistance genes with a highly mobile genetic vehicle that could greatly exceed the effects of previously reported mobile genetic mechanisms. It has been hypothesized that bacteria will surprise us by extending their “genetic construction kit” to procure and evince additional DNA and, therefore, antibiotic resistance genes. It appears that ISCR elements have now firmly established themselves within that regimen.  相似文献   

15.
The specific recognition between the import receptor importin-α and the nuclear localization signals (NLSs) is crucial to ensure the selective transport of cargoes into the nucleus. NLSs contain 1 or 2 clusters of positively charged amino acids, which usually bind to the major (monopartite NLSs) or both minor and major NLS-binding sites (bipartite NLSs). In our recent study, we determined the structure of importin-α1a from rice (Oryza sativa), and made 2 observations that suggest an increased utilization of the minor NLS-binding site in this protein. First, unlike the mammalian protein, both the major and minor NLS-binding sites are auto-inhibited in the unliganded rice protein. Second, we showed that NLSs of the “plant-specific” class preferentially bind to the minor NLS-binding site of rice importin-α. Here, we show that a distinct group of “minor site-specific” NLSs also bind to the minor site of the rice protein. We further show a greater enrichment of proteins containing these “plant-specific” and “minor site-specific” NLSs in the rice proteome. However, the analysis of the distribution of different classes of NLSs in diverse eukaryotes shows that in all organisms, the minor site-specific NLSs are much less prevalent than the classical monopartite and bipartite NLSs.  相似文献   

16.
Porcine reproductive and respiratory syndrome virus (PRRSV) is the major pathogen in the pig industry. Variability of the antigens and persistence are the biggest challenges for successful control and elimination of the disease. GP5, the major glycoprotein of PRRSV, is considered an important target of neutralizing antibodies, which however appear only late in infection. This was attributed to the presence of a “decoy epitope” located near a hypervariable region of GP5. This region also harbors the predicted signal peptide cleavage sites and (dependent on the virus strain) a variable number of potential N-glycosylation sites. Molecular processing of GP5 has not been addressed experimentally so far: whether and where the signal peptide is cleaved and (as a consequence) whether the “decoy epitope” is present in virus particles. We show that the signal peptide of GP5 from the American type 2 reference strain VR-2332 is cleaved, both during in vitro translation in the presence of microsomes and in transfected cells. This was found to be independent of neighboring glycosylation sites and occurred in a variety of porcine cells for GP5 sequences derived from various type 2 strains. The exact signal peptide cleavage site was elucidated by mass spectrometry of virus-derived and recombinant GP5. The results revealed that the signal peptide of GP5 is cleaved at two sites. As a result, a mixture of GP5 proteins exists in virus particles, some of which still contain the “decoy epitope” sequence. Heterogeneity was also observed for the use of glycosylation sites in the hypervariable region. Lastly, GP5 mutants were engineered where one of the signal peptide cleavage sites was blocked. Wildtype GP5 exhibited exactly the same SDS-PAGE mobility as the mutant that is cleavable at site 2 only. This indicates that the overwhelming majority of all GP5 molecules does not contain the “decoy epitope”.  相似文献   

17.
18.
19.
Previously available primer sets for detecting anaerobic ammonium-oxidizing (anammox) bacteria are inefficient, resulting in a very limited database of such sequences, which limits knowledge of their ecology. To overcome this limitation, we designed a new primer set that was 100% specific in the recovery of ~700-bp 16S rRNA gene sequences with >96% homology to the “Candidatus Scalindua” group of anammox bacteria, and we detected this group at all sites studied, including a variety of freshwater and marine sediments and permafrost soil. A second primer set was designed that exhibited greater efficiency than previous primers in recovering full-length (1,380-bp) sequences related to “Ca. Scalindua,” “Candidatus Brocadia,” and “Candidatus Kuenenia.” This study provides evidence for the widespread distribution of anammox bacteria in that it detected closely related anammox 16S rRNA gene sequences in 11 geographically and biogeochemically diverse freshwater and marine sediments.  相似文献   

20.

Background

Folding nucleus of globular proteins formation starts by the mutual interaction of a group of hydrophobic amino acids whose close contacts allow subsequent formation and stability of the 3D structure. These early steps can be predicted by simulation of the folding process through a Monte Carlo (MC) coarse grain model in a discrete space. We previously defined MIRs (Most Interacting Residues), as the set of residues presenting a large number of non-covalent neighbour interactions during such simulation. MIRs are good candidates to define the minimal number of residues giving rise to a given fold instead of another one, although their proportion is rather high, typically [15-20]% of the sequences. Having in mind experiments with two sequences of very high levels of sequence identity (up to 90%) but different folds, we combined the MIR method, which takes sequence as single input, with the “fuzzy oil drop” (FOD) model that requires a 3D structure, in order to estimate the residues coding for the fold. FOD assumes that a globular protein follows an idealised 3D Gaussian distribution of hydrophobicity density, with the maximum in the centre and minima at the surface of the “drop”. If the actual local density of hydrophobicity around a given amino acid is as high as the ideal one, then this amino acid is assigned to the core of the globular protein, and it is assumed to follow the FOD model. Therefore one obtains a distribution of the amino acids of a protein according to their agreement or rejection with the FOD model.

Results

We compared and combined MIR and FOD methods to define the minimal nucleus, or keystone, of two populated folds: immunoglobulin-like (Ig) and flavodoxins (Flav). The combination of these two approaches defines some positions both predicted as a MIR and assigned as accordant with the FOD model. It is shown here that for these two folds, the intersection of the predicted sets of residues significantly differs from random selection. It reduces the number of selected residues by each individual method and allows a reasonable agreement with experimentally determined key residues coding for the particular fold. In addition, the intersection of the two methods significantly increases the specificity of the prediction, providing a robust set of residues that constitute the folding nucleus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号