首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Phydbac is a web interactive resource based on phylogenomic profiling, designed to help microbiologists to annotate bacterial proteins. Phylogenomic annotation is based on the assumption that functionally linked protein-coding genes must evolve in a coordinated manner. The detection of subsets of co-evolving genes within a given genome involves the computation of protein sequence conservation profiles across a spectrum of microbial species, followed by the identification of significant pairwise correlations between them. Many ongoing studies are devoted to the problem of computing the most biologically significant phylogenomic profiles and how best identifying clusters of 'functionally interacting' genes. Here we introduce a web tool, Phydbac, allowing the dynamic construction of phylogenomic profiles of protein sequences of interest and their interactive display. In addition, Phydbac can identify Escherichia coli proteins exhibiting the evolution pattern most similar to arbitrary query protein sequences, hence providing functional hints for open reading frames (ORFs) of hypothetical or unknown function. The phylogenomic profiles of all E.coli K-12 protein-coding genes are pre-computed, allowing queries about E.coli genes to be answered instantaneously. The profiles and phylogenomic neighborhoods are computed using an original method shown to perform better than previous ones. An extension of Phydbac, including precomputed profiles for all available bacterial genomes (including major pathogens) will soon be available. Phydbac can be accessed at: http://igs-server.cnrs-mrs.fr/phydbac/.  相似文献   

2.
Although we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin‐like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.  相似文献   

3.
The rice (Oryza sativa) genome contains 1,429 protein kinases, the vast majority of which have unknown functions. We created a phylogenomic database (http://rkd.ucdavis.edu) to facilitate functional analysis of this large gene family. Sequence and genomic data, including gene expression data and protein-protein interaction maps, can be displayed for each selected kinase in the context of a phylogenetic tree allowing for comparative analysis both within and between large kinase subfamilies. Interaction maps are easily accessed through links and displayed using Cytoscape, an open source software platform. Chromosomal distribution of all rice kinases can also be explored via an interactive interface.  相似文献   

4.
Pterin-4a-carbinolamine dehydratases (PCDs) recycle oxidized pterin cofactors generated by aromatic amino acid hydroxylases (AAHs). PCDs are known biochemically only from animals and one bacterium, but PCD-like proteins (COG2154 in the Clusters of Orthologous Groups [COGs] database) are encoded by many plant and microbial genomes. Because these genomes often encode no AAH homologs, the annotation of their COG2154 proteins as PCDs is questionable. Moreover, some COG2154 proteins lack canonical residues that are catalytically important in mammalian PCDs. Diverse COG2154 proteins of plant, fungal, protistan, and prokaryotic origin were therefore tested for PCD activity by functional complementation in Escherichia coli, and the plant proteins were localized using green fluorescent protein fusions. Higher and lower plants proved to have two COG2154 proteins, a mitochondrial one with PCD activity and a noncanonical, plastidial one without. Phylogenetic analysis indicated that the latter is unique to plants and arose from the former early in the plant lineage. All 10 microbial COG2154 proteins tested had PCD activity; six of these came from genomes with no AAH, and six were noncanonical. The results suggested the motif [EDKH]-x(3)-H-[HN]-[PCS]-x(5,6)-[YWF]-x(9)-[HW]-x(8,15)-D as a signature for PCD activity. Organisms having a functional PCD but no AAH partner include angiosperms, yeast, and various prokaryotes. In these cases, PCD presumably has another function. An ancillary role in molybdopterin cofactor metabolism, hypothesized from phylogenomic evidence, was supported by demonstrating significantly lowered activities of two molybdoenzymes in Arabidopsis thaliana PCD knockout mutants. Besides this role, we propose that partnerless PCDs support the function of as yet unrecognized pterin-dependent enzymes.  相似文献   

5.
ABSTRACT: BACKGROUND: The Escherichia coli species contains a variety of commensal and pathogenic strains, and its intraspecific diversity is extraordinarily high. With the availability of an increasing number of E. coli strain genomes, a more comprehensive concept of their evolutionary history and ecological adaptation can be developed using phylogenomic analyses. In this study, we constructed two types of whole-genome phylogenies based on 34 E. coli strains using collinear genomic segments. The first phylogeny was based on the concatenated collinear regions shared by all of the studied genomes, and the second phylogeny was based on the variable collinear regions that are absent from at least one genome. Intuitively, the first phylogeny is likely to reveal the lineal evolutionary history among these strains (i.e., an evolutionary phylogeny), whereas the latter phylogeny is likely to reflect the whole-genome similarities of extant strains (i.e., a similarity phylogeny). RESULTS: Within the evolutionary phylogeny, the strains were clustered in accordance with known phylogenetic groups and phenotypes. When comparing evolutionary and similarity phylogenies, a concept emerges that Shigella may have originated from at least three distinct ancestors and evolved into a single clade. By scrutinizing the properties that are shared amongst Shigella strains but missing in other E. coli genomes, we found that the common regions of the Shigella genomes were mainly influenced by mobile genetic elements, implying that they may have experienced convergent evolution via horizontal gene transfer. Based on an inspection of certain key branches of interest, we identified several collinear regions that may be associated with the pathogenicity of specific strains. Moreover, by examining the annotated genes within these regions, further detailed evidence associated with pathogenicity was revealed. CONCLUSIONS: Collinear regions are reliable genomic features used for phylogenomic analysis among closely related genomes while linking the genomic diversity with phenotypic differences in a meaningful way. The pathogenicity of a strain may be associated with both the arrival of virulence factors and the modification of genomes via mutations. Such phylogenomic studies that compare collinear regions of whole genomes will help to better understand the evolution and adaptation of closely related microbes and E. coli in particular.  相似文献   

6.
7.
During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application.  相似文献   

8.
The increasing availability of complete genome sequences and the development of new, faster methods for phylogenetic reconstruction allow the exploration of the set of evolutionary trees for each gene in the genome of any species. This has led to the development of new phylogenomic methods. Here, we have compared different phylogenetic and phylogenomic methods in the analysis of the monophyletic origin of insect endosymbionts from the gamma-Proteobacteria, a hotly debated issue with several recent, conflicting reports. We have obtained the phylogenetic tree for each of the 579 identified protein-coding genes in the genome of the primary endosymbiont of carpenter ants, Blochmannia floridanus, after determining their presumed orthologs in 20 additional Proteobacteria genomes. A reference phylogeny reflecting the monophyletic origin of insect endosymbionts was further confirmed with different approaches, which led us to consider it as the presumed species tree. Remarkably, only 43 individual genes produced exactly the same topology as this presumed species tree. Most discrepancies between this tree and those obtained from individual genes or by concatenation of different genes were due to the grouping of Xanthomonadales with beta-Proteobacteria and not to uncertainties over the monophyly of insect endosymbionts. As previously noted, operational genes were more prone to reject the presumed species tree than those included in information-processing categories, but caution should be exerted when selecting genes for phylogenetic inference on the basis of their functional category assignment. We have obtained strong evidence in support of the monophyletic origin of gamma-Proteobacteria insect endosymbionts by a combination of phylogenetic and phylogenomic methods. In our analysis, the use of concatenated genes has shown to be a valuable tool for analyzing primary phylogenetic signals coded in the genomes. Nevertheless, other phylogenomic methods such as supertree approaches were useful in revealing alternative phylogenetic signals and should be included in comprehensive phylogenomic studies.  相似文献   

9.
Intrinsically unstructured proteins and their functions   总被引:3,自引:0,他引:3  
Many gene sequences in eukaryotic genomes encode entire proteins or large segments of proteins that lack a well-structured three-dimensional fold. Disordered regions can be highly conserved between species in both composition and sequence and, contrary to the traditional view that protein function equates with a stable three-dimensional structure, disordered regions are often functional, in ways that we are only beginning to discover. Many disordered segments fold on binding to their biological targets (coupled folding and binding), whereas others constitute flexible linkers that have a role in the assembly of macromolecular arrays.  相似文献   

10.
Large-scale two-hybrid screens have generated a wealth of information describing potential protein--protein interactions. When compiled with data from systematic localizations of proteins, mutant screens and other functional tests, a network of interactions among proteins and between proteins and other components of eukaryotic cells can be deduced. These networks can be viewed as maps of the cell, depicting potential signaling pathways and interactive complexes. Most importantly, they provide potential clues to the function of previously uncharacterized proteins. Focusing on recent experiments, we explore these protein-interaction studies and the maps derived from such efforts.  相似文献   

11.
A phylogenomic study of the MutS family of proteins.   总被引:23,自引:4,他引:19       下载免费PDF全文
The MutS protein of Escherichia coli plays a key role in the recognition and repair of errors made during the replication of DNA. Homologs of MutS have been found in many species including eukaryotes, Archaea and other bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E.coli MutS, there is significant diversity of function among the MutS family members. This diversity is even seen within species; many species encode multiple MutS homologs with distinct functions. To better characterize the MutS protein family, I have used a combination of phylogenetic reconstructions and analysis of complete genome sequences. This phylogenomic analysis is used to infer the evolutionary relationships among the MutS family members and to divide the family into subfamilies of orthologs. Analysis of the distribution of these orthologs in particular species and examination of the relationships within and between subfamilies is used to identify likely evolutionary events (e.g. gene duplications, lateral transfer and gene loss) in the history of the MutS family. In particular, evidence is presented that a gene duplication early in the evolution of life resulted in two main MutS lineages, one including proteins known to function in mismatch repair and the other including proteins known to function in chromosome segregation and crossing-over. The inferred evolutionary history of the MutS family is used to make predictions about some of the uncharacterized genes and species included in the analysis. For example, since function is generally conserved within subfamilies and lineages, it is proposed that the function of uncharacterized proteins can be predicted by their position in the MutS family tree. The uses of phylogenomic approaches to the study of genes and genomes are discussed.  相似文献   

12.
Recently a number of computational approaches have been developed for the prediction of protein–protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.  相似文献   

13.
Predicting functions of proteins and alternatively spliced isoforms encoded in a genome is one of the important applications of bioinformatics in the post-genome era. Due to the practical limitation of experimental characterization of all proteins encoded in a genome using biochemical studies, bioinformatics methods provide powerful tools for function annotation and prediction. These methods also help minimize the growing sequence-to-function gap. Phylogenetic profiling is a bioinformatics approach to identify the influence of a trait across species and can be employed to infer the evolutionary history of proteins encoded in genomes. Here we propose an improved phylogenetic profile-based method which considers the co-evolution of the reference genome to derive the basic similarity measure, the background phylogeny of target genomes for profile generation and assigning weights to target genomes. The ordering of genomes and the runs of consecutive matches between the proteins were used to define phylogenetic relationships in the approach. We used Escherichia coli K12 genome as the reference genome and its 4195 proteins were used in the current analysis. We compared our approach with two existing methods and our initial results show that the predictions have outperformed two of the existing approaches. In addition, we have validated our method using a targeted protein-protein interaction network derived from protein-protein interaction database STRING. Our preliminary results indicates that improvement in function prediction can be attained by using coevolution-based similarity measures and the runs on to the same scale instead of computing them in different scales. Our method can be applied at the whole-genome level for annotating hypothetical proteins from prokaryotic genomes.  相似文献   

14.
Protein–protein interactions are intrinsic to virtually every cellular process. Recent breakthroughs in techniques to study protein-interaction and the availability of fully sequenced plant genomes have attracted many plant scientists to undertake the first steps in the field of protein interactions. High-throughput screening systems allow the discovery of protein functions. Even without performing laborious functional assays, in planta functional homologues and redundant proteins can be accurately predicted based on protein-interaction maps. Therefore, protein–protein-interaction screenings are an essential supplement to the current functional-genomics toolbox.  相似文献   

15.
MOTIVATION: Phylogenomic profiling is a large-scale comparative genomic method used to infer protein function from evolutionary information first described in a binary form by Pellegrini et al. (1999). Here, we propose improvements of this approach including the use of normalized Blastp bit scores, a normalization of the matrix of profiles to take into account the evolutionary distances between bacteria, the definition of a phylogenomic neighborhood based on continuous pairwise distances between genes and an original annotation procedure including the computation of a p-value for each functional assignment. RESULTS: The method presented here increases the number of Ecocyc enzymes identified as being evolutionarily related by about 25% with respect to the original binary form (absent/present) method. The fraction of 'false' positives is shown to be smaller than 20%. Based on their phylogenomic relationships, genes of unknown function can then be automatically related to annotated genes. Each gene annotation predicted is associated with a p-value, i.e. its probability to be obtained by chance. The validity of this method was extensively tested on a large set of genes of known function using the MultiFun database. We find that 50% of 3122 function attributions that can be made at a p-value level of 10(-11) correspond to the actual gene annotation. The method can be readily applied to any newly sequenced microbial genome. In contrast to earlier work on the same topic, our approach avoids the use of arbitrary cut-off values, and provides a reliability estimate of the functional predictions in form of p-values.  相似文献   

16.
The gene composition of present-day genomes has been shaped by a complicated evolutionary history, resulting in diverse distributions of genes across genomes. The pattern of presence and absence of a gene in different genomes is called its phylogenetic profile. It has been shown that proteins whose encoding genes have highly similar profiles tend to be functionally related: As these genes were gained and lost together, their encoded proteins can probably only perform their full function if both are present. However, a large proportion of genes encoding interacting proteins do not have matching profiles. In this study, we analysed one possible reason for this, namely that phylogenetic profiles can be affected by multi-functional proteins such as shared subunits of two or more protein complexes. We found that by considering triplets of proteins, of which one protein is multi-functional, a large fraction of disturbed co-occurrence patterns can be explained.  相似文献   

17.
Thermodynamic characterization of the relative stabilities of chemical compounds is a pillar of conceptual models in various fields of geosciences. Analogous models applied to genomes can yield new information about the relationship between genomes and their geochemical environments. In this perspective article, we present a chemical and thermodynamic analysis of prokaryotic lineages that have been the target of previous phylogenomic studies of evolutionary adaptation to varying redox conditions. The thermodynamic model development begins by quantifying the effects of hydrogen activity (aH2) and temperature on the relative stabilities of organic compounds with different carbon oxidation state. When applied to proteins instead of metabolites, the same techniques can be used to identify combinations of aH2 and temperature at which reference proteomes for Class I or Class II methanogens are relatively stable. The calculated aH2 values are compatible with reported measurements for habitats of methanogens ranging from highly reducing submarine hydrothermal systems to less reducing environments including methanogenic sediments. In contrast to the transition between the two classes of methanogenic archaea, that between basal and terrestrial groups of Thaumarchaeota (denoting the origin of ammonia-oxidizing archaea) occurs at a less-reducing redox boundary. These examples reveal the consequences of energy minimization driving evolution and show how geochemical calculations involving biomolecules can be used to quantify and better understand the coevolution of the geosphere and biosphere.  相似文献   

18.

Background

The origin of eukaryotes remains a fundamental question in evolutionary biology. Although it is clear that eukaryotic genomes are a chimeric combination of genes of eubacterial and archaebacterial ancestry, the specific ancestry of most eubacterial genes is still unknown. The growing availability of microbial genomes offers the possibility of analyzing the ancestry of eukaryotic genomes and testing previous hypotheses on their origins.

Methodology/Principal Findings

Here, we have applied a phylogenomic analysis to investigate a possible contribution of the Myxococcales to the first eukaryotes. We conducted a conservative pipeline with homologous sequence searches against a genomic sampling of 40 eukaryotic and 357 prokaryotic genomes. The phylogenetic reconstruction showed that several eukaryotic proteins traced to Myxococcales. Most of these proteins were associated with mitochondrial lipid intermediate pathways, particularly enzymes generating reducing equivalents with pivotal roles in fatty acid β-oxidation metabolism. Our data suggest that myxococcal species with the ability to oxidize fatty acids transferred several genes to eubacteria that eventually gave rise to the mitochondrial ancestor. Later, the eukaryotic nucleocytoplasmic lineage acquired those metabolic genes through endosymbiotic gene transfer.

Conclusions/Significance

Our results support a prokaryotic origin, different from α-proteobacteria, for several mitochondrial genes. Our data reinforce a fluid prokaryotic chromosome model in which the mitochondrion appears to be an important entry point for myxococcal genes to enter eukaryotes.  相似文献   

19.
Genes that are clustered on multiple genomes and are likely to functionally interact tend to be gained or lost together during genome evolution. Here, we demonstrate that exceptions to this pattern indicate relatively distant functional interactions between the encoded proteins. Hence, this can be used to divide predicted clusters of functionally interacting proteins into sub-clusters, and as such, to refine the prediction of their function and functional interactions.  相似文献   

20.
There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号