首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Lipid modification of the N-terminal Cys residue (N-acyl-S-diacylglyceryl-Cys) has been found to be an essential, ubiquitous, and unique bacterial posttranslational modification. Such a modification allows anchoring of even highly hydrophilic proteins to the membrane which carry out a variety of functions important for bacteria, including pathogenesis. Hence, being able to identify such proteins is of great value. To this end, we have created a comprehensive database of bacterial lipoproteins, called DOLOP, which contains information and links to molecular details for about 278 distinct lipoproteins and predicted lipoproteins from 234 completely sequenced bacterial genomes. The website also features a tool that applies a predictive algorithm to identify the presence or absence of the lipoprotein signal sequence in a user-given sequence. The experimentally verified lipoproteins have been classified into different functional classes and more importantly functional domain assignments using hidden Markov models from the SUPERFAMILY database that have been provided for the predicted lipoproteins. Other features include the following: primary sequence analysis, signal sequence analysis, and search facility and information exchange facility to allow researchers to exchange results on newly characterized lipoproteins. The website, along with additional information on the biosynthetic pathway, statistics on predicted lipoproteins, and related figures, is available at http://www.mrc-lmb.cam.ac.uk/genomes/dolop/.  相似文献   

2.
Streptococcus agalactiae is a significant pathogen causing invasive disease in neonates and thus an understanding of the molecular basis of the pathogenicity of this organism is of importance. N-terminal lipidation is a major mechanism by which bacteria can tether proteins to membranes. Lipidation is directed by the presence of a cysteine-containing 'lipobox' within specific signal peptides and this feature has greatly facilitated the bioinformatic identification of putative lipoproteins. We have designed previously a taxon-specific pattern (G+LPP) for the identification of Gram-positive bacterial lipoproteins, based on the signal peptides of experimentally verified lipoproteins (Sutcliffe I.C. and Harrington D.J. Microbiology 148: 2065-2077). Patterns searches with this pattern and other bioinformatic methods have been used to identify putative lipoproteins in the recently published genomes of S. agalactiae strains 2603/V and NEM316. A core of 39 common putative lipoproteins was identified, along with 5 putative lipoproteins unique to strain 2603/V and 2 putative lipoproteins unique to strain NEM316. Thus putative lipoproteins represent ca. 2% of the S. agalactiae proteome. As in other Gram-positive bacteria, the largest functional category of S. agalactiae lipoproteins is that predicted to comprise of substrate binding proteins of ABC transport systems. Other roles include lipoproteins that appear to participate in adhesion (including the previously characterised Lmb protein), protein export and folding, enzymes and several species-specific proteins of unknown function. These data suggest lipoproteins may have significant roles that influence the virulence of this important pathogen.  相似文献   

3.
Bacterial lipoproteins/lipopeptides inducing host innate immune responses are sensed by mammalian Toll-like receptor 2 (TLR2). These bacterial lipoproteins are structurally divided into two groups, diacylated or triacylated lipoproteins, by the absence or presence of an amide-linked fatty acid. The presence of diacylated lipoproteins has been predicted in low-GC content gram-positive bacteria and mycoplasmas based on the absence of one modification enzyme in their genomes; however, we recently determined triacylated structures in low-GC gram-positive Staphylococcus aureus, raising questions about the actual lipoprotein structure in other low-GC content gram-positive bacteria. Here, through intensive MS analyses, we identified a novel and unique bacterial lipoprotein structure containing an N-acyl-S-monoacyl-glyceryl-cysteine (named the lyso structure) from low-GC gram-positive Enterococcus faecalis, Bacillus cereus, Streptococcus sanguinis, and Lactobacillus bulgaricus. Two of the purified native lyso-form lipoproteins induced proinflammatory cytokine production from mice macrophages in a TLR2-dependent and TLR1-independent manner but with a different dependence on TLR6. Additionally, two other new lipoprotein structures were identified. One is the "N-acetyl" lipoprotein structure containing N-acetyl-S-diacyl-glyceryl-cysteine, which was found in five gram-positive bacteria, including Bacillus subtilis. The N-acetyl lipoproteins induced the proinflammatory cytokines through the TLR2/6 heterodimer. The other was identified in a mycoplasma strain and is an unusual diacyl lipoprotein structure containing two amino acids before the lipid-modified cysteine residue. Taken together, our results suggest the existence of novel TLR2-stimulating lyso and N-acetyl forms of lipoproteins that are conserved in low-GC content gram-positive bacteria and provide clear evidence for the presence of yet to be identified key enzymes involved in the bacterial lipoprotein biosynthesis.  相似文献   

4.
Mycobacterium tuberculosis remains the predominant bacterial scourge of mankind. Understanding of its biology and pathogenicity has been greatly advanced by the determination of whole genome sequences for this organism. Bacterial lipoproteins are a functionally diverse class of membrane-anchored proteins. The signal peptides of these proteins direct their export and post-translational lipid modification. These signal peptides are amenable to bioinformatic analysis, allowing the lipoproteins encoded in whole genomes to be catalogued. This review applies bioinformatic methods to the identification and functional characterisation of the lipoproteins encoded in the M. tuberculosis genomes. Ninety nine putative lipoproteins were identified and so this family of proteins represents ca. 2.5% of the M. tuberculosis predicted proteome. Thus, lipoproteins represent an important class of cell envelope proteins that may contribute to the virulence of this major pathogen.  相似文献   

5.
Bacterial lipoproteins are a diverse and functionally important group of proteins that are amenable to bioinformatic analyses because of their unique signal peptide features. Here we have used a dataset of sequences of experimentally verified lipoproteins of Gram-positive bacteria to refine our previously described lipoprotein recognition pattern (G+LPP). Sequenced bacterial genomes can be screened for putative lipoproteins using the G+LPP pattern. The sequences identified can then be validated using online tools for lipoprotein sequence identification. We have used our protein sequence datasets to evaluate six online tools for efficacy of lipoprotein sequence identification. Our analyses demonstrate that LipoP () performs best individually but that a consensus approach, incorporating outputs from predictors of general signal peptide properties, is most informative. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

6.
There are now fourteen completed genomes of bacterial phytopathogens, all of which have been generated in the past six years. These genomes come from a phylogenetically diverse set of organisms, and range in size from 870 kb to more than 6Mb. The publication of these annotated genomes has significantly helped our understanding of bacterial plant disease. These genomes have also provided important information about bacterial evolution. Examples of recently completed genomes include: Pseudomonas syringae pv tomato, which is notable for its large repertoire of effector proteins; Leifsonia xyli subsp. xyli, the first Gram-positive bacterial genome to be sequenced; and Phytoplasma asteris, the small genome that lacks important functions previously thought to be essential in a bacterium.  相似文献   

7.
With the completion of the Human Genome Project in 2003, many new projects to sequence bacterial genomes were started and soon many complete bacterial genome sequences were available. The sequenced genomes of pathogenic bacteria provide useful information for understanding host-pathogen interactions. These data prove to be a new weapon in fighting against pathogenic bacteria by providing information about potential drug targets. But the limitation of computational tools for finding potential drug targets has hindered the process and further experimental analysis. There are many in silico approaches proposed for finding drug targets but only few have been automated. One such approach finds essential genes in bacterial genomes with no human homologue and predicts these as potential drug targets. The same approach is used in our tool. T-iDT, a tool for the identification of drug targets, finds essential genes by comparing a bacterial gene set against DEG (Database of Essential Genes) and excludes homologue genes by comparing against a human protein database. The tool predicts both the set of essential genes as well as potential target genes for the given genome. The tool was tested with Mycobacterium tuberculosis and results were validated. With default parameters, the tool predicted 236 essential genes and 52 genes to encode potential drug targets. A pathway-based approach was used to validate these potential drug target genes. The pathway in which the products of these genes are involved was determined. Our analysis shows that almost all these pathways are very essential for the bacterial survival and hence these genes encode possible drug targets. Our tool provides a fast method for finding possible drug targets in bacterial genomes with varying stringency level. The tool will be helpful in finding possible drug targets in various pathogenic organisms and can be used for further analysis in novel therapeutic drug development. The tool can be downloaded from http://www.milser.co.in/research.htm and http://www.srmbioinformatics.edu.in/ forum.htm.  相似文献   

8.
Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions. MaGe is also linked to numerous well-known biological databases and systems. Our system has been thoroughly tested during the annotation of complete bacterial genomes (Acinetobacter baylyi ADP1, Pseudoalteromonas haloplanktis, Frankia alni) and is currently used in the context of several new microbial genome annotation projects. In addition, MaGe allows for annotation curation and exploration of already published genomes from various genera (e.g. Yersinia, Bacillus and Neisseria). MaGe can be accessed at http://www.genoscope.cns.fr/agc/mage.  相似文献   

9.
Coding information is the main source of heterogeneity (non-randomness) in the sequences of microbial genomes. The heterogeneity corresponds to a cluster structure in triplet distributions of relatively short genomic fragments (200-400 bp). We found a universal 7-cluster structure in microbial genomic sequences and explained its properties. We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy. Based on the analysis of 143 completely sequenced bacterial genomes available in Genbank in August 2004, we show that there are four "pure" types of the 7-cluster structure observed. All 143 cluster animated 3D-scatters are collected in a database which is made available on our web-site (http://www.ihes.fr/~zinovyev/7clusters). The findings can be readily introduced into software for gene prediction, sequence alignment or microbial genomes classification.  相似文献   

10.
In special coordinates (codon position-specific nucleotide frequencies), bacterial genomes form two straight lines in 9-dimensional space: one line for eubacterial genomes, another for archaeal genomes. All the 348 distinct bacterial genomes available in Genbank in April 2007, belong to these lines with high accuracy. The main challenge now is to explain the observed high accuracy. The new phenomenon of complementary symmetry for codon position-specific nucleotide frequencies is observed. The results of analysis of several codon usage models are presented. We demonstrate that the mean-field approximation, which is also known as context-free, or complete independence model, or Segre variety, can serve as a reasonable approximation to the real codon usage. The first two principal components of codon usage correlate strongly with genomic G+C content and the optimal growth temperature, respectively. The variation of codon usage along the third component is related to the curvature of the mean-field approximation. First three eigenvalues in codon usage PCA explain 59.1%, 7.8% and 4.7% of variation. The eubacterial and archaeal genomes codon usage is clearly distributed along two third order curves with genomic G+C content as a parameter.  相似文献   

11.
Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete genomes from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Some studies have attempted to extract complete bacterial, archaeal, and viral genomes and often focus on species with circular genomes so they can help confirm completeness with circularity. However, less than 100 circularized bacterial and archaeal genomes have been assembled and published from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a semi-automated method called Jorg to help circularize small bacterial, archaeal, and viral genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. In addition to 34 circular CPR genomes, we present one circular Margulisbacteria genome, one circular Chloroflexi genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg and is available on the DOE Systems Biology KnowledgeBase as a beta app.  相似文献   

12.
Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of [Formula: see text] different virulence-related genes among more than [Formula: see text] finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes ([Formula: see text]) is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at [Formula: see text]), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions.  相似文献   

13.
14.
It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: “What have we learned from this vast amount of new genomic data?” Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity—even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this information.  相似文献   

15.
16.
We have recently developed a database, pDAWG, focused on information related to plant cell walls. Currently, pDAWG contains seven complete plant genomes, 12 complete algal genomes, along with computed information for individual proteins encoded in these genomes of the following types: (a) carbohydrate active enzyme (CAZy) family information when applicable; (b) phylogenetic trees of cell wall-related CAZy family proteins; (c) protein structure models if available; (d) physical and predicted interactions among proteins; (e) subcellular localization; (f) Pfam domain information; and (g) homology-based functional prediction. A querying system with a graphical interface allows a user to quickly compose information of different sorts about individual genes/proteins and to display the composite information in an intuitive manner, facilitating comparative analyses and knowledge discovery about cell wall genes. pDAWG can be accessed at http://csbl1.bmb.uga.edu/pDAWG/.  相似文献   

17.
Marine phages are the most abundant biological entities in the oceans. They play important roles in carbon cycling through marine food webs, gene transfer by transduction and conversion of hosts by lysogeny. The handful of marine phage genomes that have been sequenced to date, along with prophages in marine bacterial genomes, and partial sequencing of uncultivated phages are yielding glimpses of the tremendous diversity and physiological potential of the marine phage community. Common gene modules in diverse phages are providing the information necessary to make evolutionary comparisons. Finally, deciphering phage genomes is providing clues about the adaptive response of phages and their hosts to environmental cues.  相似文献   

18.
Choi K  Kim S 《Proteins》2011,79(4):1118-1131
The two‐component system (TCS) is a signal transduction system that involves a histidine kinase (HK) and a response regulator (RR). Although up to hundreds of TCSs may operate in parallel in a bacterial cell, the high‐fidelity of a TCS signaling is well maintained, minimizing irrelevant crosstalk between TCSs. When a HK gene and a RR gene in a given TCS system exist in neighboring positions, it is almost certain that their protein products (i.e., HK and RR) are interacting partners. However, large bacterial genomes often have multiple HK genes and/or cognate RR genes that are not neighboring positions. In many partially assembled genomes, some HK genes and RR genes belong to different contigs. In these cases, it is not clear which HK(s) and RR(s) interact. By combining information‐theoretic and graph‐theoretic approaches, we developed a computational method identifying co‐evolving residue pairs between HKs and cognate RRs and predicting the interacting HK:RR pairs for each TCS. In addition, we built a TCSppWWW webserver ( http://compath.org/platcom/tcs ) that takes query sequences of pairing candidates and predicts their HK:RR pairing using precomputed models. The current release of TCSppWWW provides predictors for 48 TCSs using over 20,000 protein sequences from about 900 bacterial genomes. Three different types of predictors using Random Forest, RBF Network, and Naïve Bayes are provided. Once a set of HK and RR candidate sequences are submitted, TCSppWWW aligns query sequences to the precomputed multiple sequence alignment of HK:RR pairs, extracts co‐evolving column positions, then returns prediction results with prediction margin and additional information. Proteins 2011. © 2010 Wiley‐Liss, Inc.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号