首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Database searches can fail to detect all truly homologous sequences, particularly when dealing with short, highly sequence diverse protein families. Here, using microtubule interacting and transport (MIT) domains as an example, we have applied an approach of profile-profile matching followed by ab initio structure modelling to the detection of true homologues in the borderline significant zone of database searches. Novel MIT domains were confidently identified in USP54, containing an apparently inactive ubiquitin carboxyl-terminal hydrolase domain, a katanin-like ATPase KATNAL1, and an uncharacterized protein containing a VPS9 domain. As a proof of principle, we have confirmed the novel MIT annotation for USP54 by in vitro profiling of binding to CHMP proteins.

Structured summary

USP8 binds:CHMPs 1A 1B 2A 2B 4CUSP54 binds:CHMPs 1B 2A 2B 4C 6  相似文献   

2.
Designing new protein folds requires a method for simultaneously optimizing the conformation of the backbone and the side-chains. One approach to this problem is the use of a parameterized backbone, which allows the systematic exploration of families of structures. We report the crystal structure of RH3, a right-handed, three-helix coiled coil that was designed using a parameterized backbone and detailed modeling of core packing. This crystal structure was determined using another rationally designed feature, a metal-binding site that permitted experimental phasing of the X-ray data. RH3 adopted the intended fold, which has not been observed previously in biological proteins. Unanticipated structural asymmetry in the trimer was a principal source of variation within the RH3 structure. The sequence of RH3 differs from that of a previously characterized right-handed tetramer, RH4, at only one position in each 11 amino acid sequence repeat. This close similarity indicates that the design method is sensitive to the core packing interactions that specify the protein structure. Comparison of the structures of RH3 and RH4 indicates that both steric overlap and cavity formation provide strong driving forces for oligomer specificity.  相似文献   

3.
The basic framework of understanding the mechanisms of protein functions is achieved from the knowledge of their structures which can model the molecular recognition. Recent advancement in the structural biology has revealed that in spite of the availability of the structural data, it is nontrivial to predict the mechanism of the molecular recognition which progresses via situation-dependent structural adaptation. The mutual selectivity of protein–protein and protein–ligand interactions often depends on the modulations of conformations empowered by their inherent flexibility, which in turn regulates the function. The mechanism of a protein’s function, which used to be explained by the ideas of ‘lock and key’ has evolved today as the concept of ‘induced fit’ as well as the ‘population shift’ models. It is felt that the ‘dynamics’ is an essential feature to take into account for understanding the mechanism of protein’s function. The design principles of therapeutic molecules suffer from the problems of plasticity of the receptors whose binding conformations are accurately not predictable from the prior knowledge of a template structure. On the other hand, flexibility of the receptors provides the opportunity to improve the binding affinity of a ligand by suitable substitution that will maximize the binding by modulating the receptors surface. In this paper, we discuss with example how the protein’s flexibility is correlated with its functions in various systems, revealing the importance of its understanding and for making applications. We also highlight the methodological challenges to investigate it computationally and to account for the flexible nature of the molecules in drug design.  相似文献   

4.
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4 %. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9 % for proteins comprising continuous domains only, and 35.4 % for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8 %. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation.  相似文献   

5.
Stereochemistry could be a powerful variable for conformational tune up of polypeptides for de novo design. It may be also useful probe of possible role of interamide energetics in selection and stabilization of conformation. The homopolypeptides Ac-Xxx30-NHMe, with Xxx = Ala, Val, and Leu, of diversified stereochemical structure are generated by simulated racemization with a modified GROMOS-96 force field. The polypeptides, and other systematic stereochemical variants, are folded by simulated annealing with another modified GROMOS-96 force field under the dielectric constant values 1, 4, and 10. The resultant 15,000 molecular folds of isotactic (poly-L-chiral), syndiotactic (alternating L,D-chiral), and heterotactic (random-L,D-chiral) stereochemical structure, belonging to three polypeptide series, achieved under three different folding conditions, are assessed statistically for structure-to-energy-to-conformation relationship. The results suggest that interamide electrostatics could be a major factor in secondary-structure selection in polypeptides while main-chain stereochemistry could dictate molecular packing and therefore the relative magnitude of hydrogen-bond and Lennard-Jones (LJ) contributions in conformational energy. A method for computational design of heterotactic molecular folds in polypeptide structure has been developed, and the first road map for a chiral tune up of polypeptide structure based on stereochemical engineering has been laid down. Broad implications for protein structure, folding, and de novo design are briefly discussed.  相似文献   

6.
An important objective of computational protein design is the generation of high affinity peptide inhibitors of protein-peptide interactions, both as a precursor to the development of therapeutics aimed at disrupting disease causing complexes, and as a tool to aid investigators in understanding the role of specific complexes in the cell. We have developed a computational approach to increase the affinity of a protein-peptide complex by designing N or C-terminal extensions which interact with the protein outside the canonical peptide binding pocket. In a first in silico test, we show that by simultaneously optimizing the sequence and structure of three to nine residue peptide extensions starting from short (1-6 residue) peptide stubs in the binding pocket of a peptide binding protein, the approach can recover both the conformations and the sequences of known binding peptides. Comparison with phage display and other experimental data suggests that the peptide extension approach recapitulates naturally occurring peptide binding specificity better than fixed backbone design, and that it should be useful for predicting peptide binding specificities from crystal structures. We then experimentally test the approach by designing extensions for p53 and dystroglycan-based peptides predicted to bind with increased affinity to the Mdm2 oncoprotein and to dystrophin, respectively. The measured increases in affinity are modest, revealing some limitations of the method. Based on these in silico and experimental results, we discuss future applications of the approach to the prediction and design of protein-peptide interactions.  相似文献   

7.
The C2 domain is one of the most frequent and widely distributed calcium-binding motifs. Its structure comprises an eight-stranded beta-sandwich with two structural types as if the result of a circular permutation. Combining sequence, structural and modelling information, we have explored, at different levels of granularity, the functional characteristics of several families of C2 domains. At the coarsest level, the similarity correlates with key structural determinants of the C2 domain fold and, at the finest level, with the domain architecture of the proteins containing them, highlighting the functional diversity between the various sub-families. The functional diversity appears as different conserved surface patches throughout this common fold. In some cases, these patches are related to substrate-binding sites whereas in others they correspond to interfaces of presumably permanent interaction between other domains within the same polypeptide chain. For those related to substrate-binding sites, the predictions overlap with biochemical data in addition to providing some novel observations. For those acting as protein-protein interfaces, our modelling analysis suggests that slight variations between families are a result of not only complementary adaptations in the interfaces involved but also different domain architecture. In the light of the sequence and structural genomic projects, the work presented here shows that modelling approaches along with careful sub-typing of protein families will be a powerful combination for a broader coverage in proteomics.  相似文献   

8.
We use flexible backbone protein design to explore the sequence and structure neighborhoods of naturally occurring proteins. The method samples sequence and structure space in the vicinity of a known sequence and structure by alternately optimizing the sequence for a fixed protein backbone using rotamer based sequence search, and optimizing the backbone for a fixed amino acid sequence using atomic-resolution structure prediction. We find that such a flexible backbone design method better recapitulates protein family sequence variation than sequence optimization on fixed backbones or randomly perturbed backbone ensembles for ten diverse protein structures. For the SH3 domain, the backbone structure variation in the family is also better recapitulated than in randomly perturbed backbones. The potential application of this method as a model of protein family evolution is highlighted by a concerted transition to the amino acid sequence in the structural core of one SH3 domain starting from the backbone coordinates of an homologous structure.  相似文献   

9.
10.
The various motifs of RNA molecules are closely related to their structural and functional properties. To better understand the nature and distributions of such structural motifs (i.e., paired and unpaired bases in stems, junctions, hairpin loops, bulges, and internal loops) and uncover characteristic features, we analyze the large 16S and 23S ribosomal RNAs of Escherichia coli. We find that the paired and unpaired bases in structural motifs have characteristic distribution shapes and ranges; for example, the frequency distribution of paired bases in stems declines linearly with the number of bases, whereas that for unpaired bases in junctions has a pronounced peak. Significantly, our survey reveals that the ratio of total (over the entire molecule) unpaired to paired bases (0.75) and the fraction of bases in stems (0.6), junctions (0.16), hairpin loops (0.12), and bulges/internal loops (0.12) are shared by 16S and 23S ribosomal RNAs, suggesting that natural RNAs may maintain certain proportions of bases in various motifs to ensure structural integrity. These findings may help in the design of novel RNAs and in the search (via constraints) for RNA-coding motifs in genomes, problems of intense current focus.  相似文献   

11.
Identification and size characterization of surface pockets and occluded cavities are initial steps in protein structure-based ligand design. A new program, CAST, for automatically locating and measuring protein pockets and cavities, is based on precise computational geometry methods, including alpha shape and discrete flow theory. CAST identifies and measures pockets and pocket mouth openings, as well as cavities. The program specifies the atoms lining pockets, pocket openings, and buried cavities; the volume and area of pockets and cavities; and the area and circumference of mouth openings. CAST analysis of over 100 proteins has been carried out; proteins examined include a set of 51 monomeric enzyme-ligand structures, several elastase-inhibitor complexes, the FK506 binding protein, 30 HIV-1 protease-inhibitor complexes, and a number of small and large protein inhibitors. Medium-sized globular proteins typically have 10-20 pockets/cavities. Most often, binding sites are pockets with 1-2 mouth openings; much less frequently they are cavities. Ligand binding pockets vary widely in size, most within the range 10(2)-10(3)A3. Statistical analysis reveals that the number of pockets and cavities is correlated with protein size, but there is no correlation between the size of the protein and the size of binding sites. Most frequently, the largest pocket/cavity is the active site, but there are a number of instructive exceptions. Ligand volume and binding site volume are somewhat correlated when binding site volume is < or =700 A3, but the ligand seldom occupies the entire site. Auxiliary pockets near the active site have been suggested as additional binding surface for designed ligands (Mattos C et al., 1994, Nat Struct Biol 1:55-58). Analysis of elastase-inhibitor complexes suggests that CAST can identify ancillary pockets suitable for recruitment in ligand design strategies. Analysis of the FK506 binding protein, and of compounds developed in SAR by NMR (Shuker SB et al., 1996, Science 274:1531-1534), indicates that CAST pocket computation may provide a priori identification of target proteins for linked-fragment design. CAST analysis of 30 HIV-1 protease-inhibitor complexes shows that the flexible active site pocket can vary over a range of 853-1,566 A3, and that there are two pockets near or adjoining the active site that may be recruited for ligand design.  相似文献   

12.
High levels of synonymous substitutions among alleles of the surface antigen SerH led to the hypothesis that Tetrahymena thermophila has a tremendously large effective population size, one that is greater than estimated for many prokaryotes (Lynch, M., and J. S. Conery. 2003. Science 302:1401-1404.). Here we show that SerH is unusual as there are substantially lower levels of synonymous variation at five additional loci (four nuclear and one mitochondrial) characterized from T. thermophila populations. Hence, the effective population size of T. thermophila, a model single-celled eukaryote, is lower and more consistent with estimates from other microbial eukaryotes. Moreover, reanalysis of SerH polymorphism data indicates that this protein evolves through a combination of vertical transmission of alleles and concerted evolution of repeat units within alleles. SerH may be under balancing selection due to a mechanism analogous to the maintenance of antigenic variation in vertebrate immune systems. Finally, the dual nature of ciliate genomes and particularly the amitotic divisions of processed macronuclear genomes may make it difficult to estimate accurately effective population size from synonymous polymorphisms. This is because selection and drift operate on processed chromosomes in macronuclei, where assortment of alleles, disruption of linkage groups, and recombination can alter the genetic landscape relative to more canonical eukaryotic genomes.  相似文献   

13.
We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user‐friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org .  相似文献   

14.
Disulfide-rich domains are small protein domains whose global folds are stabilized primarily by the formation of disulfide bonds and, to a much lesser extent, by secondary structure and hydrophobic interactions. Disulfide-rich domains perform a wide variety of roles functioning as growth factors, toxins, enzyme inhibitors, hormones, pheromones, allergens, etc. These domains are commonly found both as independent (single-domain) proteins and as domains within larger polypeptides. Here, we present a comprehensive structural classification of approximately 3000 small, disulfide-rich protein domains. We find that these domains can be arranged into 41 fold groups on the basis of structural similarity. Our fold groups, which describe broader structural relationships than existing groupings of these domains, bring together representatives with previously unacknowledged similarities; 18 of the 41 fold groups include domains from several SCOP folds. Within the fold groups, the domains are assembled into families of homologs. We define 98 families of disulfide-rich domains, some of which include newly detected homologs, particularly among knottin-like domains. On the basis of this classification, we have examined cases of convergent and divergent evolution of functions performed by disulfide-rich proteins. Disulfide bonding patterns in these domains are also evaluated. Reducible disulfide bonding patterns are much less frequent, while symmetric disulfide bonding patterns are more common than expected from random considerations. Examples of variations in disulfide bonding patterns found within families and fold groups are discussed.  相似文献   

15.
The majority of proteins consist of multiple domains that are either repeated or combined in defined order. In this study, we survey the combination of protein domains defined at fold and fold superfamily levels in 185 genomes belonging to organisms that have been fully sequenced and introduce a method that reconstructs rooted phylogenomic trees from the content and arrangement of domains in proteins at a genomic level. We find that the majority of domain combinations were unique to Archaea, Bacteria, or Eukarya, suggesting most combinations originated after life had diversified. Domain repeat and domain repeat within multidomain proteins increased notably in eukaryotes, mainly at the expense of single-domain and domain-pair proteins. This increase was mostly confined to Metazoa. We also find an unbalanced sharing of domain combinations which suggests that Eukarya is more closely related to Bacteria than to Archaea, an observation that challenges the widely assumed eukaryote-archaebacterial sisterhood relationship. The occurrence and abundance of the molecular repertoire (interactome) of domain combinations was used to generate phylogenomic trees. These global interactome-based phylogenies described organismal histories satisfactorily, revealing the tripartite nature of life, and supporting controversial evolutionary patterns, such as the Coelomata hypothesis, the grouping of plants and animals, and the Gram-positive origin of bacteria. Results suggest strongly that the process of domain combination is not random but curved by evolution, rejecting the null hypothesis of domain modules combining in the absence of natural selection or an optimality criterion.  相似文献   

16.
17.
We present a systematic study of the clustering of genes within the human genome based on homology inferred from both sequence and structural similarity. The 3D-Genomics automated proteome annotation pipeline () was utilised to infer homology for each protein domain in the genome, for the 26 superfamilies most highly represented in the Structural Classification Of Proteins (SCOP) database. This approach enabled us to identify homologues that could not be detected by sequence-based methods alone. For each superfamily, we investigated the distribution, both within and among chromosomes, of genes encoding at least one domain within the superfamily. The results indicate a diversity of clustering behaviours: some superfamilies showed no evidence of any clustering, and others displayed significant clustering either within or among chromosomes, or both. Removal of tandem repeats reduced the levels of clustering observed, but some superfamilies still displayed highly significant clustering. Thus, our study suggests that either the process of gene duplication, or the evolution of the resulting clusters, differs between structural superfamilies.  相似文献   

18.
李迎侠  张婷婷  马磊 《遗传》2018,40(2):135-144
天然嵌合基因(natural chimeric gene)是由两个或两个以上的独立基因天然融合而成的新基因,该类型基因的发现,突破了“一个基因对应一个染色体座位”的经典认知,扩展了基因的概念。在人类癌症研究过程中,诸多的嵌合基因可导致肿瘤相关疾病,并作为癌症分子的诊断标志而受到人们的广泛关注。本文基于嵌合基因生物信息学方面的相关研究,以癌基因为切入点,从天然嵌合基因的融合特点、转录、调控,以及融合蛋白的结构域组合形式和功能等方面,结合本研究组前期的相关工作,综述了嵌合基因融合结构和功能的研究进展,探讨了当前研究工作的困难与挑战,并对嵌合规律在新基因设计的应用作了展望。  相似文献   

19.
Consensus-designed ankyrin repeat (AR) proteins are thermodynamically very stable. The structural analysis of the designed AR protein E3_5 revealed that this stability is due to a regular fold with highly conserved structural motifs and H-bonding networks. However, the designed AR protein E3_19 exhibits a significantly lower stability than E3_5 (9.6 vs. 14.8 kcal/mol), despite 88% sequence identity. To investigate the structural correlations of this stability difference between E3_5 and E3_19, we determined the crystal structure of E3_19 at 1.9 A resolution. E3_19 as well has a regular AR domain fold with the characteristic H-bonding patterns. All structural features of the E3_5 and E3_19 molecules appear to be virtually identical (RMSD(Calpha) approximately 0.7 A). However, clear differences are observed in the surface charge distribution of the two AR proteins. E3_19 features clusters of charged residues and more exposed hydrophobic residues than E3_5. The atomic coordinates of E3_19 have been deposited in the Protein Data Bank. PDB ID: 2BKG.  相似文献   

20.
Using a data set of aligned protein domain superfamilies of known three-dimensional structure, we compared the location of interdomain interfaces on the tertiary folds between members of distantly related protein domain superfamilies. The data set analyzed is comprised of interdomain interfaces, with domains occurring within a polypeptide chain and those between two polypeptide chains. We observe that, in general, the interfaces between protein domains are formed entirely in different locations on the tertiary folds in such pairs. This variation in the location of interface happens in protein domains involved in a wide range of functions, such as enzymes, adapters, and domains that bind protein ligands, or cofactors. While basic biochemical functionality is preserved at the domain superfamily level, the effect of biochemical function on protein assemblies is different in these protein domains related by superfamily. The divergence between proteins, in most cases, is coupled with domain recruitment, with different modes of interaction with the recruited domain. This is in complete contrast to the observation that in closely related homologous protein domains, almost always the interaction interfaces are topologically equivalent. In a small subset of interacting domains within proteins related by remote homology, we observe that the relative positioning of domains with respect to one another is preserved. Based on the analysis of multidomain proteins of known or unknown structure, we suggest that variation in protein-protein interactions in members within a superfamily could serve as diverging points in otherwise parallel metabolic or signaling pathways. We discuss a few representative cases of diverging pathways involving domains in a superfamily.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号