首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
RNA structural motifs are recurrent structural elements occurring in RNA molecules. RNA structural motif recognition aims to find RNA substructures that are similar to a query motif, and it is important for RNA structure analysis and RNA function prediction. In view of this, we propose a new method known as RNA Structural Motif Recognition based on Least-Squares distance (LS-RSMR) to effectively recognize RNA structural motifs. A test set consisting of five types of RNA structural motifs occurring in Escherichia coli ribosomal RNA is compiled by us. Experiments are conducted for recognizing these five types of motifs. The experimental results fully reveal the superiority of the proposed LS-RSMR compared with four other state-of-the-art methods.  相似文献   

2.
The crystal structure of a conserved hypothetical protein from Escherichia coli has been determined using X-ray crystallography. The protein belongs to the Cluster of Orthologous Group COG1553 (National Center for Biotechnology Information database, NLM, NIH), for which there was no structural information available until now. Structural homology search with DALI algorism indicated that this protein has a new fold with no obvious similarity to those of other proteins with known three-dimensional structures. The protein quaternary structure consists of a dimer of trimers, which makes a characteristic cylinder shape. There is a large closed cavity with approximate dimensions of 16 Å × 16 Å × 20 Å in the center of the hexameric structure. Six putative active sites are positioned along the equatorial surface of the hexamer. There are several highly conserved residues including two possible functional cysteines in the putative active site. The possible molecular function of the protein is discussed.  相似文献   

3.
    
Protein function elucidation often relies heavily on amino acid sequence analysis and other bioinformatics approaches. The reliance is extended to structure homology modeling for ligand docking and protein–protein interaction mapping. However, sequence analysis of RPA3313 exposes a large, unannotated class of hypothetical proteins mostly from the Rhizobiales order. In the absence of sequence and structure information, further functional elucidation of this class of proteins has been significantly hindered. A high quality NMR structure of RPA3313 reveals that the protein forms a novel split ββαβ fold with a conserved ligand binding pocket between the first β‐strand and the N‐terminus of the α‐helix. Conserved residue analysis and protein–protein interaction prediction analyses reveal multiple protein binding sites and conserved functional residues. Results of a mass spectrometry proteomic analysis strongly point toward interaction with the ribosome and its subunits. The combined structural and proteomic analyses suggest that RPA3313 by itself or in a larger complex may assist in the transportation of substrates to or from the ribosome for further processing. Proteins 2016; 85:93–102. © 2016 Wiley Periodicals, Inc.  相似文献   

4.
    
Xanthomonas campestris pv. campestris is a Gram‐negative yellow‐pigmented pathogenic bacterium that causes black rot, one of the major worldwide diseases of cruciferous crops. Its genome contains approximately 4500 genes, roughly one third of which have no known structure and/or function. However, some of these unknown genes are highly conserved among several different bacterial genuses. XC229 is one such protein containing 134 amino acids. It was overexpressed in Escherichia coli, purified and crystallized using the hanging‐drop vapour‐diffusion method. The crystal diffracted to a resolution of at least 1.80 Å. It is cubic and belongs to space group I2x3, with unit‐cell parameters a = b = c = 106.8 Å. It contains one or two molecules per asymmetric unit.  相似文献   

5.
    
Structural genomics offers a potential route to the discovery of protein function. As part of a structural genomics project focused on the hyperthermophilic crenarchaeon Pyrobaculum aerophilum, a conserved hypothetical protein, PAE2754, has been expressed in Escherichia coli, purified and crystallized. Because of the difficulties of preparing interpretable heavy‐atom derivatives with limited resolution and 8–12 molecules in the asymmetric unit, two leucine residues were selected for mutation to methionine. The double mutant L65M/L80M was created, expressed incorporating SeMet and crystallized. The crystals are monoclinic, space group P21, with unit‐cell parameters a = 56.4, b = 193.3, c = 60.5 Å, β = 94.6° and eight molecules (two tetramers) in the asymmetric unit. The crystals diffract to 2.75 Å resolution and are suitable for MAD phasing.  相似文献   

6.
7.
    
TT1887 and TT1465 from Thermus thermophilus HB8 are conserved hypothetical proteins, and are annotated as possible lysine decarboxylases in the Pfam database. Here we report the crystal structures of TT1887 and TT1465 at 1.8 A and 2.2 A resolutions, respectively, as determined by the multiwavelength anomalous dispersion (MAD) method. TT1887 is a homotetramer, while TT1465 is a homohexamer in the crystal and in solution. The structures of the TT1887 and TT1465 monomers contain single domains with the Rossmann fold, comprising six alpha helices and seven beta strands, and are quite similar to each other. The major structural differences exist in the N terminus of TT1465, where there are two additional alpha helices. A comparison of the structures revealed the elements that are responsible for the different oligomerization modes. The distributions of the electrostatic potential on the solvent-accessible surfaces suggested putative active sites.  相似文献   

8.
A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects.  相似文献   

9.
10.
    
To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.  相似文献   

11.
    
In the era of structural genomics, it is necessary to generate accurate structural alignments in order to build good templates for homology modeling. Although a great number of structural alignment algorithms have been developed, most of them ignore intermolecular interactions during the alignment procedure. Therefore, structures in different oligomeric states are barely distinguishable, and it is very challenging to find correct alignment in coil regions. Here we present a novel approach to structural alignment using a clique finding algorithm and environmental information (SAUCE). In this approach, we build the alignment based on not only structural coordinate information but also realistic environmental information extracted from biological unit files provided by the Protein Data Bank (PDB). At first, we eliminate all environmentally unfavorable pairings of residues. Then we identify alignments in core regions via a maximal clique finding algorithm. Two extreme value distribution (EVD) form statistics have been developed to evaluate core region alignments. With an optional extension step, global alignment can be derived based on environment-based dynamic programming linking. We show that our method is able to differentiate three-dimensional structures in different oligomeric states, and is able to find flexible alignments between multidomain structures without predetermined hinge regions. The overall performance is also evaluated on a large scale by comparisons to current structural classification databases as well as to other alignment methods.  相似文献   

12.
    
XC5848, a hypothetical protein from the pathogenic bacterium Xanthomonas campestris that causes black rot, has been chosen as a potential target for the discovery of novel folds. It is unique to the Xanthomonas genus and has significant sequence identity mainly to corresponding proteins from the Xanthomonas genus. In this paper, the cloning, overexpression, purification and crystallization of the XC5848 protein are reported. The XC5848 crystals diffracted to a resolution of at least 1.68 Å. They belong to the orthorhombic space group P212121, with unit‐cell parameters a = 48.13, b = 51.62, c = 82.32 Å. Two molecules were found in each asymmetric unit. Preliminary structural studies nevertheless indicate that XC5848 belongs to the highly conserved Sm‐like α‐­β‐­β‐β‐β fold. However, significant differences in sequence and structure were observed. It therefore represents a novel variant of the crucial Sm‐like motif that is heavily involved in mRNA splicing and degradation.  相似文献   

13.
    
The crystal structure of HI0074 from Haemophilus influenzae, a protein of unknown function, has been determined at a resolution of 2.4 A. The molecules form an up-down, four-helix bundle, and associate into homodimers. The fold is most closely related to the substrate-binding domain of KNTase, yet the amino acid sequences of the two proteins exhibit no significant homology. Sequence analyses of completely and incompletely sequenced genomes reveal that the two adjacent genes, HI0074 and HI0073, and their close relatives comprise a new family of nucleotidyltransferases, with 15 members at the time of writing. The analyses also indicate that this is one of eight families of a large nucleotidyltransferase superfamily, whose members were identified based on the proximity of the nucleotide- and substrate-binding domains on the respective genomes. Both HI0073 and HI0074 were annotated \"hypothetical\" in the original genome sequencing publication. HI0073 was cloned, expressed, and purified, and was shown to form a complex with HI0074 by polyacrylamide gel electrophoresis under nondenaturing conditions, analytic size exclusion chromatography, and dynamic light scattering. Double- and single-stranded DNA binding assays showed no evidence of DNA binding to HI0074 or to HI0073/HI0074 complex despite the suggestive shape of the putative binding cleft formed by the HI0074 dimer.  相似文献   

14.
The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.  相似文献   

15.
    
Identification of protein biochemical functions based on their three-dimensional structures is now required in the post-genome-sequencing era. Ligand binding is one of the major biochemical functions of proteins, and thus the identification of ligands and their binding sites is the starting point for the function identification. Previously we reported our first trial on structure-based function prediction, based on the similarity searches of molecular surfaces against the functional site database. Here we describe the extension of our first trial by expanding the search database to whole heteroatom binding sites appearing within the Protein Data Bank (PDB) with the new analysis protocol. In addition, we have determined the similarity threshold line, by using 10 structure pairs with solved free and complex structures. Finally, we extensively applied our method to newly determined hypothetical proteins, including some without annotations, and evaluated the performance of our methods.  相似文献   

16.
The TT1485 gene from Thermus thermophilus HB8 encodes a hypothetical protein of unknown function with about 20 sequence homologs of bacterial or archaeal origin. Together they form a family of uncharacterized proteins, the cluster of orthologous group COG3253. Using a combination of amino acid sequence analysis, three-dimensional structural studies and biochemical assays, we identified TT1485 as a novel heme-binding protein. The crystal structure reveals that this protein is a pentamer and each monomer exhibits a β-barrel fold. TT1485 is structurally similar to muconolactone isomerase, but this provided no functional clues. Amino acid sequence analysis revealed remote homology to a heme enzyme, chlorite dismutase. Strikingly, amino acid residues that are highly conserved in the homologous hypothetical proteins and chlorite dismutase cluster around a deep cavity on the surface of each monomer. Molecular modeling shows that the cavity can accommodate a heme group with a strictly conserved His as a heme ligand. TT1485 reconstituted with iron protoporphyrin IX chloride gave a low chlorite dismutase activity, indicating that TT1485 catalyzes a reaction other than chlorite degradation. The presence of a possible Fe–His–Asp triad in the heme proximal site suggests that TT1485 functions as a novel heme peroxidase to detoxify hydrogen peroxide within the cell.  相似文献   

17.
Neural RNA recognition motif (RRM)-type RNA-binding proteins play essential roles in neural development. To search for a new member of neural RRM-type RNA-binding protein, we screened rat cerebral expression library with polyclonal antibody against consensus RRM sequences. We have cloned and characterized a rat cDNA that belongs to RRM-type RNA-binding protein family, which we designate as drb1. Orthologs of drb1 exist in human and mouse. The predicted amino acid sequence reveals an open reading frame of 476 residues with a corresponding molecular mass of 53kDa and consists of four RNA-binding domains. drb1 gene is specifically expressed in fetal (E12, E16) rat brain and gradually reduced during development. In situ hybridization demonstrated neuron-specific signals in fetal rat brain. RNA-binding assay indicated that human Drb1 protein possesses binding preference on poly(C)RNA. These results indicate that Drb1 is a new member of neural RNA-binding proteins, which expresses under spatiotemporal control.  相似文献   

18.
    
Xanthomonas campestris pv. campestris strain 17 is a Gram‐negative yellow‐pigmented pathogenic bacterium that causes black rot, one of the major worldwide diseases of cruciferous crops. Its genome contains approximately 4500 genes, one third of which have no known structure and/or function yet are highly conserved among several different bacterial genuses. One of these gene products is XC1692 protein, containing 141 amino acids. It was overexpressed in Escherichia coli, purified and crystallized in a variety of forms using the hanging‐drop vapour‐diffusion method. The crystals diffract to at least 1.45 Å resolution. They are hexagonal and belong to space group P63, with unit‐cell parameters a = b = 56.9, c = 71.0 Å. They contain one molecule per asymmetric unit.  相似文献   

19.
The essential pre-mRNA splicing factor, U2 auxiliary factor 65KD (U2AF(65)) recognizes the polypyrimidine tract (Py-tract) consensus sequence of the pre-mRNA using two RNA recognition motifs (RRMs), the most prevalent class of eukaryotic RNA-binding domain. The Py-tracts of higher eukaryotic pre-mRNAs are often interrupted with purines, yet U2AF(65) must identify these degenerate Py-tracts for accurate pre-mRNA splicing. Previously, the structure of a U2AF(65) variant in complex with poly(U) RNA suggested that rearrangement of flexible side-chains or bound water molecules may contribute to degenerate Py-tract recognition by U2AF(65). Here, the X-ray structure of the N-terminal RRM domain of U2AF(65) (RRM1) is described at 1.47 A resolution in the absence of RNA. Notably, RNA-binding by U2AF(65) selectively stabilizes pre-existing alternative conformations of three side-chains located at the RNA interface (Arg150, Lys225, and Arg227). Additionally, a flexible loop connecting the beta2/beta3 strands undergoes a conformational change to interact with the RNA. These pre-existing alternative conformations may contribute to the ability of U2AF(65) to recognize a variety of Py-tract sequences. This rare, high-resolution view of an important member of the RRM class of RNA-binding domains highlights the role of alternative side-chain conformations in RNA recognition.  相似文献   

20.
    
Xanthomonas campestris pv. campestris is a Gram‐negative yellow‐pigmented pathogenic bacterium that causes black rot, one of the major worldwide diseases of cruciferous crops. Its genome contains approximately 4500 genes, roughly one third of which have no known structure and/or function. However, some genes of unknown function are highly conserved among several different bacterial genuses. XC6422 is one such conserved hypothetical protein and has been overexpressed in Escherichia coli, purified and crystallized in a variety of forms using the hanging‐drop vapour‐diffusion method. Crystals grew to approximately 2 × 1.5 × 0.4 mm in size after one week and diffracted to at least 1.6 Å resolution. They belong to the monoclinic space group C2, with one molecule per asymmetric unit and unit‐cell parameters a = 75.8, b = 79.3, c = 38.2 Å, β = 109.4°. Determination of this structure may provide insights into the protein's function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号