共查询到20条相似文献,搜索用时 0 毫秒
1.
Distribution of orphan metabolic activities 总被引:2,自引:0,他引:2
A significant fraction (30-40%) of known metabolic activities is currently orphan. Although orphan activities have been biochemically characterized, we do not know a single gene responsible for these reactions in any organism. The problem of orphan activities represents one of the major challenges of modern biochemistry. We analyze the distribution of orphans across biochemical space, through years of enzymatic characterization, and by biological organisms. We find that orphan metabolic activities have been accumulating for many decades. They are widely distributed across enzymatic functional space and metabolic network neighborhoods. Although orphans are relatively more abundant in less studied species, over half of orphan reactions have been experimentally characterized in more than one organism. Shrinking the space of orphan activities will likely require a close collaboration between computational and experimental laboratories. 相似文献
2.
MOTIVATION: Genes with identical patterns of occurrence across the phyla tend to function together in the same protein complexes or participate in the same biochemical pathway. However, the requirement that the profiles be identical (i) severely restricts the number of functional links that can be established by such phylogenetic profiling; (ii) limits detection to very strong functional links, failing to capture relations between genes that are not in the same pathway, but nevertheless subserve a common function and (iii) misses relations between analogous genes. Here we present and apply a method for relaxing the restriction, based on the probability that a given arbitrary degree of similarity between two profiles would occur by chance, with no biological pressure. Function is then inferred at any desired level of confidence. RESULTS: We derive an expression for the probability distribution of a given number of chance co-occurrences of a pair of non-homologous orthologs across a set of genomes. The method is applied to 2905 clusters of orthologous genes (COGs) from 44 fully sequenced microbial genomes representing all three domains of life. Among the results are the following. (1) Of the 51 000 annotated intrapathway gene pairs, 8935 are linked at a level of significance of 0.01. This is over 30-fold greater than the 271 intrapathway pairs obtained at the same confidence level when identical profiles are used. (2) Of the 540 000 interpathway genes pairs, some 65 000 are linked at the 0.01 level of significance, some 12 standard deviations beyond the number expected by chance at this confidence level. We speculate that many of these links involve nearest-neighbor path, and discuss some examples. (3) The difference in the percentage of linked interpathway and intrapathway genes is highly significant, consistent with the intuitive expectation that genes in the same pathway are generally under greater selective pressure than those that are not. (4) The method appears to recover well metabolic networks. This is illustrated by the TCA cycle which is recovered as a highly connected, weighted edge network of 30 of its 31 COGs. (5) The fraction of pairs having a common pathway is a symmetric function of the Hamming distance between their profiles. This finding, that the functional correlation between profiles with near maximum Hamming distance is as large as between profiles with near zero Hamming distance, and as statistically significant, is plausibly explained if the former group represents analogous genes. 相似文献
3.
4.
5.
Ronesh Sharma Shiu Kumar Tatsuhiko Tsunoda Ashwini Patil Alok Sharma 《BMC bioinformatics》2016,17(19):504
Background
Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs in IDPs using computational methods is a challenging task.Methods
In this study, we introduce hidden Markov model (HMM) profiles to accurately identify the location of MoRFs in disordered protein sequences. Using windowing technique, HMM profiles are utilised to extract features from protein sequences and support vector machines (SVM) are used to calculate a propensity score for each residue. Two different SVM kernels with high noise tolerance are evaluated with a varying window size and the scores of the SVM models are combined to generate the final propensity score to predict MoRF residues. The SVM models are designed to extract maximal information between MoRF residues, its neighboring regions (Flanks) and the remainder of the sequence (Others).Results
To evaluate the proposed method, its performance was compared to that of other MoRF predictors; MoRFpred and ANCHOR. The results show that the proposed method outperforms these two predictors.Conclusions
Using HMM profile as a source of feature extraction, the proposed method indicates improvement in predicting MoRFs in disordered protein sequences.6.
Patients who have undergone renal transplantation are monitored longitudinally at irregular time intervals over 10 years or more. This yields a set of biochemical and physiological markers containing valuable information to anticipate a failure of the graft. A general linear, generalized linear, or nonlinear mixed model is used to describe the longitudinal profile of each marker. To account for the correlation between markers, the univariate mixed models are combined into a multivariate mixed model (MMM) by specifying a joint distribution for the random effects. Due to the high number of markers, a pairwise modeling strategy, where all possible pairs of bivariate mixed models are fitted, is used to obtain parameter estimates for the MMM. These estimates are used in a Bayes rule to obtain, at each point in time, the prognosis for long-term success of the transplant. It is shown that allowing the markers to be correlated can improve this prognosis. 相似文献
7.
Shiri Freilich Leon Goldovsky Assaf Gottlieb Eric Blanc Sophia Tsoka Christos A Ouzounis 《BMC bioinformatics》2009,10(1):355
Background
Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. 相似文献8.
Douglas B. Kell 《Trends in biotechnology》1998,16(12):290-493
9.
Fan Zhang Yuanyuan Zhang Chaofu Ke Ang Li Wenjie Wang Kai Yang Huijuan Liu Hongyu Xie Kui Deng Weiwei Zhao Chunyan Yang Ge Lou Yan Hou Kang Li 《Metabolomics : Official journal of the Metabolomic Society》2018,14(5):65
Background
Previous metabolomic studies have revealed that plasma metabolic signatures may predict epithelial ovarian cancer (EOC) recurrence. However, few studies have performed metabolic profiling of pre- and post-operative specimens to investigate EOC prognostic biomarkers.Objective
The aims of our study were to compare the predictive performance of pre- and post-operative specimens and to create a better model for recurrence by combining biomarkers from both metabolic signatures.Methods
Thirty-five paired plasma samples were collected from 35 EOC patients before and after surgery. The patients were followed-up until December, 2016 to obtain recurrence information. Metabolomics using rapid resolution liquid chromatography–mass spectrometry was performed to identify metabolic signatures related to EOC recurrence. The support vector machine model was employed to predict EOC recurrence using identified biomarkers.Results
Global metabolomic profiles distinguished recurrent from non-recurrent EOC using both pre- and post-operative plasma. Ten common significant biomarkers, hydroxyphenyllactic acid, uric acid, creatinine, lysine, 3-(3,5-diiodo-4-hydroxyphenyl) lactate, phosphohydroxypyruvic acid, carnitine, coproporphyrinogen, l-beta-aspartyl-l-glutamic acid and 24,25-hydroxyvitamin D3, were identified as predictive biomarkers for EOC recurrence. The area under the receiver operating characteristic (AUC) values in pre- and post-operative plasma were 0.815 and 0.909, respectively; the AUC value after combining the two sets reached 0.964.Conclusion
Plasma metabolomic analysis could be used to predict EOC recurrence. While post-operative biomarkers have a predictive advantage over pre-operative biomarkers, combining pre- and post-operative biomarkers showed the best predictive performance and has great potential for predicting recurrent EOC.10.
Background
Gene expression data extracted from microarray experiments have been used to study the difference between mRNA abundance of genes under different conditions. In one of such experiments, thousands of genes are measured simultaneously, which provides a high-dimensional feature space for discriminating between different sample classes. However, most of these dimensions are not informative about the between-class difference, and add noises to the discriminant analysis.Results
In this paper we propose and study feature selection methods that evaluate the "informativeness" of a set of genes. Two measures of information based on multigene expression profiles are considered for a backward information-driven screening approach for selecting important gene features. By considering multigene expression profiles, we are able to utilize interaction information among these genes. Using a breast cancer data, we illustrate our methods and compare them to the performance of existing methods.Conclusion
We illustrate in this paper that methods considering gene-gene interactions have better classification power in gene expression analysis. In our results, we identify important genes with relative large p-values from single gene tests. This indicates that these are genes with weak marginal information but strong interaction information, which will be overlooked by strategies that only examine individual genes.11.
Two different methods of using paralogous genes for phylogenetic inference have been proposed: reconciled trees (or gene tree parsimony) and uninode coding. Gene tree parsimony suffers from 10 serious problems, including differential weighting of nucleotide and gap characters, undersampling which can be misinterpreted as synapomorphy, all of the characters not being allowed to interact, and conflict between gene trees being given equal weight, regardless of branch support. These problems are largely avoided by using uninode coding. The uninode coding method is elaborated to address multiple gene duplications within a single gene tree family and handle problems caused by lack of gene tree resolution. An example of vertebrate phylogeny inferred from nine genes is reanalyzed using uninode coding. We suggest that uninode coding be used instead of gene tree parsimony for phylogenetic inference from paralogous genes. 相似文献
12.
13.
M. A. Pyatnitskiy A. V. Lisitsa A. I. Archakov 《Biochemistry (Moscow) Supplemental Series B: Biomedical Chemistry》2010,4(1):42-48
Computational interactomics deals with prediction of functionally related proteins. One approach for solving this problem using comparative genomics consists in analysis of similarities between phylogenetic profiles of proteins. In contrast to most methods, which predict only pairwise interactions between proteins, in the present work we have applied the cluster analysis techniques in order to find modules of functionally related proteins. We have performed the cluster analysis of phylogenetic profiles of E. coli proteins using several clustering techniques and various modes for estimation of distances between profiles. We report here, that the best correspondence in the composition of resultant clusters to known metabolic pathways is achieved using Ward’s clustering together with Hamming’s distance. The proposed technique of assessing predictions of the modules of functionally related proteins can be used for comparative analysis of different algorithms for computational interactomics. 相似文献
14.
Appala Raju Kotaru Khader Shameer Pandurangan Sundaramurthy Ramesh Chandra Joshi 《Bioinformation》2013,9(7):368-374
Predicting functions of proteins and alternatively spliced isoforms encoded in a genome is one of the important applications of
bioinformatics in the post-genome era. Due to the practical limitation of experimental characterization of all proteins encoded in a
genome using biochemical studies, bioinformatics methods provide powerful tools for function annotation and prediction. These
methods also help minimize the growing sequence-to-function gap. Phylogenetic profiling is a bioinformatics approach to identify
the influence of a trait across species and can be employed to infer the evolutionary history of proteins encoded in genomes. Here
we propose an improved phylogenetic profile-based method which considers the co-evolution of the reference genome to derive
the basic similarity measure, the background phylogeny of target genomes for profile generation and assigning weights to target
genomes. The ordering of genomes and the runs of consecutive matches between the proteins were used to define phylogenetic
relationships in the approach. We used Escherichia coli K12 genome as the reference genome and its 4195 proteins were used in the
current analysis. We compared our approach with two existing methods and our initial results show that the predictions have
outperformed two of the existing approaches. In addition, we have validated our method using a targeted protein-protein
interaction network derived from protein-protein interaction database STRING. Our preliminary results indicates that
improvement in function prediction can be attained by using coevolution-based similarity measures and the runs on to the same
scale instead of computing them in different scales. Our method can be applied at the whole-genome level for annotating
hypothetical proteins from prokaryotic genomes. 相似文献
15.
MOTIVATION: The increasing availability of complete genome sequences provides excellent opportunity for the further development of tools for functional studies in proteomics. Several experimental approaches and in silico algorithms have been developed to cluster proteins into networks of biological significance that may provide new biological insights, especially into understanding the functions of many uncharacterized proteins. Among these methods, the phylogenetic profiles method has been widely used to predict protein-protein interactions. It involves the selection of reference organisms and identification of homologous proteins. Up to now, no published report has systematically studied the effects of the reference genome selection and the identification of homologous proteins upon the accuracy of this method. RESULTS: In this study, we optimized the phylogenetic profiles method by integrating phylogenetic relationships among reference organisms and sequence homology information to improve prediction accuracy. Our results revealed that the selection of the reference organisms set and the criteria for homology identification significantly are two critical factors for the prediction accuracy of this method. Our refined phylogenetic profiles method shows greater performance and potentially provides more reliable functional linkages compared with previous methods. 相似文献
16.
Background
Using computational database searches, we have demonstrated previously that no gene sequences could be found for at least 36% of enzyme activities that have been assigned an Enzyme Commission number. Here we present a follow-up literature-based survey involving a statistically significant sample of such "orphan" activities. The survey was intended to determine whether sequences for these enzyme activities are truly unknown, or whether these sequences are absent from the public sequence databases but can be found in the literature. 相似文献17.
Background
In silico candidate gene prioritisation (CGP) aids the discovery of gene functions by ranking genes according to an objective relevance score. While several CGP methods have been described for identifying human disease genes, corresponding methods for prokaryotic gene function discovery are lacking. Here we present two prokaryotic CGP methods, based on phylogenetic profiles, to assist with this task. 相似文献18.
The phylogenetic profile of a gene is a reflection of its evolutionary history and can be defined as the differential presence or absence of a gene in a set of reference genomes. It has been employed to facilitate the prediction of gene functions. However, the hypothesis that the application of this concept can also facilitate the discovery of bacterial virulence factors has not been fully examined. In this paper, we test this hypothesis and report a computational pipeline designed to identify previously unknown bacterial virulence genes using group B streptococcus (GBS) as an example. Phylogenetic profiles of all GBS genes across 467 bacterial reference genomes were determined by candidate-against-all BLAST searches,which were then used to identify candidate virulence genes by machine learning models. Evaluation experiments with known GBS virulence genes suggested good functional and model consistency in cross-validation analyses (areas under ROC curve, 0.80 and 0.98 respectively). Inspection of the top-10 genes in each of the 15 virulence functional groups revealed at least 15 (of 119) homologous genes implicated in virulence in other human pathogens but previously unrecognized as potential virulence genes in GBS. Among these highly-ranked genes, many encode hypothetical proteins with possible roles in GBS virulence. Thus, our approach has led to the identification of a set of genes potentially affecting the virulence potential of GBS, which are potential candidates for further in vitro and in vivo investigations. This computational pipeline can also be extended to in silico analysis of virulence determinants of other bacterial pathogens. 相似文献
19.
Gene evolution has long been thought to be primarily driven by duplication and rearrangement mechanisms. However, every evolutionary lineage harbours orphan genes that lack homologues in other lineages and whose evolutionary origin is only poorly understood. Orphan genes might arise from duplication and rearrangement processes followed by fast divergence; however, de novo evolution out of non-coding genomic regions is emerging as an important additional mechanism. This process appears to provide raw material continuously for the evolution of new gene functions, which can become relevant for lineage-specific adaptations. 相似文献
20.
Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes), a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short). The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12. 相似文献