首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Central to Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas systems are repeated RNA sequences that serve as Cas-protein–binding templates. Classification is based on the architectural composition of associated Cas proteins, considering repeat evolution is essential to complete the picture. We compiled the largest data set of CRISPRs to date, performed comprehensive, independent clustering analyses and identified a novel set of 40 conserved sequence families and 33 potential structure motifs for Cas-endoribonucleases with some distinct conservation patterns. Evolutionary relationships are presented as a hierarchical map of sequence and structure similarities for both a quick and detailed insight into the diversity of CRISPR-Cas systems. In a comparison with Cas-subtypes, I-C, I-E, I-F and type II were strongly coupled and the remaining type I and type III subtypes were loosely coupled to repeat and Cas1 evolution, respectively. Subtypes with a strong link to CRISPR evolution were almost exclusive to bacteria; nevertheless, we identified rare examples of potential horizontal transfer of I-C and I-E systems into archaeal organisms. Our easy-to-use web server provides an automated assignment of newly sequenced CRISPRs to our classification system and enables more informed choices on future hypotheses in CRISPR-Cas research: http://rna.informatik.uni-freiburg.de/CRISPRmap.  相似文献   

3.
Improvements in experimental techniques increasingly provide structural data relating to protein-protein interactions. Classification of structural details of protein-protein interactions can provide valuable insights for modeling and abstracting design principles. Here, we aim to cluster protein-protein interactions by their interface structures, and to exploit these clusters to obtain and study shared and distinct protein binding sites. We find that there are 22604 unique interface structures in the PDB. These unique interfaces, which provide a rich resource of structural data of protein-protein interactions, can be used for template-based docking. We test the specificity of these non-redundant unique interface structures by finding protein pairs which have multiple binding sites. We suggest that residues with more than 40% relative accessible surface area should be considered as surface residues in template-based docking studies. This comprehensive study of protein interface structures can serve as a resource for the community. The dataset can be accessed at http://prism.ccbb.ku.edu.tr/piface.  相似文献   

4.
5.
S-glutathionylation, the covalent attachment of a glutathione (GSH) to the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM) that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-glutathionylation remains unknown. Based on a total of 1783 experimentally identified S-glutathionylation sites from mouse macrophages, this work presents an informatics investigation on S-glutathionylation sites including structural factors such as the flanking amino acids composition and the accessible surface area (ASA). TwoSampleLogo presents that positively charged amino acids flanking the S-glutathionylated cysteine may influence the formation of S-glutathionylation in closed three-dimensional environment. A statistical method is further applied to iteratively detect the conserved substrate motifs with statistical significance. Support vector machine (SVM) is then applied to generate predictive model considering the substrate motifs. According to five-fold cross-validation, the SVMs trained with substrate motifs could achieve an enhanced sensitivity, specificity, and accuracy, and provides a promising performance in an independent test set. The effectiveness of the proposed method is demonstrated by the correct identification of previously reported S-glutathionylation sites of mouse thioredoxin (TXN) and human protein tyrosine phosphatase 1b (PTP1B). Finally, the constructed models are adopted to implement an effective web-based tool, named GSHSite (http://csb.cse.yzu.edu.tw/GSHSite/), for identifying uncharacterized GSH substrate sites on the protein sequences.  相似文献   

6.
Peptidyl-prolyl isomerases catalyze the conversion between cis and trans isomers of proline. The cyclophilin family of peptidyl-prolyl isomerases is well known for being the target of the immunosuppressive drug cyclosporin, used to combat organ transplant rejection. There is great interest in both the substrate specificity of these enzymes and the design of isoform-selective ligands for them. However, the dearth of available data for individual family members inhibits attempts to design drug specificity; additionally, in order to define physiological functions for the cyclophilins, definitive isoform characterization is required. In the current study, enzymatic activity was assayed for 15 of the 17 human cyclophilin isomerase domains, and binding to the cyclosporin scaffold was tested. In order to rationalize the observed isoform diversity, the high-resolution crystallographic structures of seven cyclophilin domains were determined. These models, combined with seven previously solved cyclophilin isoforms, provide the basis for a family-wide structure∶function analysis. Detailed structural analysis of the human cyclophilin isomerase explains why cyclophilin activity against short peptides is correlated with an ability to ligate cyclosporin and why certain isoforms are not competent for either activity. In addition, we find that regions of the isomerase domain outside the proline-binding surface impart isoform specificity for both in vivo substrates and drug design. We hypothesize that there is a well-defined molecular surface corresponding to the substrate-binding S2 position that is a site of diversity in the cyclophilin family. Computational simulations of substrate binding in this region support our observations. Our data indicate that unique isoform determinants exist that may be exploited for development of selective ligands and suggest that the currently available small-molecule and peptide-based ligands for this class of enzyme are insufficient for isoform specificity.

Enhanced version

This article can also be viewed as an enhanced version in which the text of the article is integrated with interactive 3-D representations and animated transitions. Please note that a Web plugin is required to access this enhanced functionality. Instructions for the installation and use of the web plugin are available in Text S1.  相似文献   

7.
We have developed the software CopyCat which provides an easy and fast access to cophylogenetic analyses. It incorporates a wrapper for the program ParaFit, which conducts a statistical test for the presence of congruence between host and parasite phylogenies. CopyCat offers various features, such as the creation of customized host-parasite association data and the computation of phylogenetic host/parasite trees based on the NCBI taxonomy. AVAILABILITY: CopyCat and its manual are freely available at http://www-ab.informatik.uni-tuebingen.de/software/copycat. SUPPLEMENTARY INFORMATION: Results of the real-world example can be found at http://www-ab.informatik.uni-tuebingen.de/software/copycat or Bioinformatics online.  相似文献   

8.
P-Type ATPases are part of the regulatory system of the cell where they are responsible for transporting ions and lipids through the cell membrane. These pumps are found in all eukaryotes and their malfunction has been found to cause several severe diseases. Knowing which substrate is pumped by a certain P-Type ATPase is therefore vital. The P-Type ATPases can be divided into 11 subtypes based on their specificity, that is, the substrate that they pump. Determining the subtype experimentally is time-consuming. Thus it is of great interest to be able to accurately predict the subtype based on the amino acid sequence only. We present an approach to P-Type ATPase sequence classification based on the k-nearest neighbors, similar to a homology search, and show that this method provides performs very well and, to the best of our knowledge, better than any existing method despite its simplicity. The classifier is made available as a web service at http://services.birc.au.dk/patbox/ which also provides access to a database of potential P-Type ATPases and their predicted subtypes.  相似文献   

9.
There is a growing interest in the Non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) of microbes, fungi and plants because they can produce bioactive peptides such as antibiotics. The ability to identify the substrate specificity of the enzyme''s adenylation (A) and acyl-transferase (AT) domains is essential to rationally deduce or engineer new products. We here report on a Hidden Markov Model (HMM)-based ensemble method to predict the substrate specificity at high quality. We collected a new reference set of experimentally validated sequences. An initial classification based on alignment and Neighbor Joining was performed in line with most of the previously published prediction methods. We then created and tested single substrate specific HMMs and found that their use improved the correct identification significantly for A as well as for AT domains. A major advantage of the use of HMMs is that it abolishes the dependency on multiple sequence alignment and residue selection that is hampering the alignment-based clustering methods. Using our models we obtained a high prediction quality for the substrate specificity of the A domains similar to two recently published tools that make use of HMMs or Support Vector Machines (NRPSsp and NRPS predictor2, respectively). Moreover, replacement of the single substrate specific HMMs by ensembles of models caused a clear increase in prediction quality. We argue that the superiority of the ensemble over the single model is caused by the way substrate specificity evolves for the studied systems. It is likely that this also holds true for other protein domains. The ensemble predictor has been implemented in a simple web-based tool that is available at http://www.cmbi.ru.nl/NRPS-PKS-substrate-predictor/.  相似文献   

10.
We present MetaRoute, an efficient search algorithm based on atom mapping rules and path weighting schemes that returns relevant or textbook-like routes between a source and a product metabolite within seconds for genome-scale networks. Its speed allows the algorithm to be used interactively through a web interface to visualize relevant routes and local networks for one or multiple organisms based on data from KEGG. AVAILABILITY: http://www-bs.informatik.uni-tuebingen.de/Services/MetaRoute. SUPPLEMENTARY INFORMATION: Supplementary details are available at http://www-bs.informatik.uni-tuebingen.de/Services/MetaRoute.  相似文献   

11.
12.
Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release.  相似文献   

13.
Strain HIMB11 is a planktonic marine bacterium isolated from coastal seawater in Kaneohe Bay, Oahu, Hawaii belonging to the ubiquitous and versatile Roseobacter clade of the alphaproteobacterial family Rhodobacteraceae. Here we describe the preliminary characteristics of strain HIMB11, including annotation of the draft genome sequence and comparative genomic analysis with other members of the Roseobacter lineage. The 3,098,747 bp draft genome is arranged in 34 contigs and contains 3,183 protein-coding genes and 54 RNA genes. Phylogenomic and 16S rRNA gene analyses indicate that HIMB11 represents a unique sublineage within the Roseobacter clade. Comparison with other publicly available genome sequences from members of the Roseobacter lineage reveals that strain HIMB11 has the genomic potential to utilize a wide variety of energy sources (e.g. organic matter, reduced inorganic sulfur, light, carbon monoxide), while possessing a reduced number of substrate transporters.  相似文献   

14.
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.  相似文献   

15.
16.
The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/.  相似文献   

17.
ProADD, a database for protein aggregation diseases, is developed to organize the data under a single platform to facilitate easy access for researchers. Diseases caused due to protein aggregation and the proteins involved in each of these diseases are integrated. The database helps in classification of proteins involved in the protein aggregation diseases based on sequence and structural analysis. Analysis of proteins can be done to mine patterns prevailing among the aggregating proteins.

Availability

http://bicmku.in/ProADD  相似文献   

18.
Increasingly large numbers of proteins require methods for functional annotation. This is typically based on pairwise inference from the homology of either protein sequence or structure. Recently, similarity networks have been presented to leverage both the ability to visualize relationships between proteins and assess the transferability of functional inference. Here we present PANADA, a novel toolkit for the visualization and analysis of protein similarity networks in Cytoscape. Networks can be constructed based on pairwise sequence or structural alignments either on a set of proteins or, alternatively, by database search from a single sequence. The Panada web server, executable for download and examples and extensive help files are available at URL: http://protein.bio.unipd.it/panada/.  相似文献   

19.
The large diversity of organisms inhabiting various environmental niches on our planet are engaged in a lively exchange of biomolecules, including nutrients, hormones, and vitamins. In a quest to survive, organisms that we define as pathogens employ innovative methods to extract valuable resources from their host leading to an infection. One such instance is where plant-associated bacterial pathogens synthesize and deploy hormones or their molecular mimics to manipulate the physiology of the host plant. This commentary describes one such specific example—the mechanism of the enzyme AldA, an aldehyde dehydrogenase (ALDH) from the bacterial plant pathogen Pseudomonas syringae which produces the plant auxin hormone indole-3-acetic acid (IAA) by oxidizing the substrate indole-3-acetaldehyde (IAAld) using the cofactor nicotinamide adenine dinucleotide (NAD+) (Bioscience Reports (2020) 40(12), https://doi.org/10.1042/BSR20202959). Using mutagenesis, enzyme kinetics, and structural analysis, Zhang et al. established that the progress of the reaction hinges on the formation of two distinct conformations of NAD(H) during the reaction course. Additionally, a key mutation in the AldA active site ‘aromatic box’ changes the enzyme’s preference for an aromatic substrate to an aliphatic one. Our commentary concludes that such molecular level investigations help to establish the nature of the dynamics of NAD(H) in ALDH-catalyzed reactions, and further show that the key active site residues control substrate specificity. We also contemplate that insights from the present study can be used to engineer novel ALDH enzymes for environmental, health, and industrial applications.  相似文献   

20.
Leptonema illini Hovind-Hougen 1979 is the type species of the genus Leptonema, family Leptospiraceae, phylum Spirochaetes. Organisms of this family have a Gram-negative-like cell envelope consisting of a cytoplasmic membrane and an outer membrane. The peptidoglycan layer is associated with the cytoplasmic rather than the outer membrane. The two flagella of members of Leptospiraceae extend from the cytoplasmic membrane at the ends of the bacteria into the periplasmic space and are necessary for their motility. Here we describe the features of the L. illini type strain, together with the complete genome sequence, and annotation. This is the first genome sequence (finished at the level of Improved High Quality Draft) to be reported from of a member of the genus Leptonema and a representative of the third genus of the family Leptospiraceae for which complete or draft genome sequences are now available. The three scaffolds of the 4,522,760 bp draft genome sequence reported here, and its 4,230 protein-coding and 47 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号