首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The reliabilities of parsimony-based and likelihood-based methods for inferring positive selection at single amino acid sites were studied using the nucleotide sequences of human leukocyte antigen (HLA) genes, in which positive selection is known to be operating at the antigen recognition site. The results indicate that the inference by parsimony-based methods is robust to the use of different evolutionary models and generally more reliable than that by likelihood-based methods. In contrast, the results obtained by likelihood-based methods depend on the models and on the initial parameter values used. It is sometimes difficult to obtain the maximum likelihood estimates of parameters for a given model, and the results obtained may be false negatives or false positives depending on the initial parameter values. It is therefore preferable to use parsimony-based methods as long as the number of sequences is relatively large and the branch lengths of the phylogenetic tree are relatively small.  相似文献   

2.
Inferring positive selection at single amino acid sites is of biological and medical importance. Parsimony-based and likelihood-based methods have been developed for this purpose, but the reliabilities of these methods are not well understood. Because the evolutionary models assumed in these methods are only rough approximations to reality, it is desirable that the methods are not very sensitive to violation of the assumptions made. In this study we show by computer simulation that the likelihood-based method is sensitive to violation of the assumptions and produces many false-positive results under certain conditions, whereas the parsimony-based method tends to be conservative. These observations, together with those from previous studies, suggest that the positively selected sites inferred by the parsimony-based method are more reliable than those inferred by the likelihood-based method.  相似文献   

3.
MOTIVATION: Availability of large volumes of genomic and enzymatic data for taxonomically and phenotypically diverse organisms allows for exploration of the adaptive mechanisms that led to diversification of enzymatic functions. We present Chisel, a computational framework and a pipeline for an automated, high-resolution analysis of evolutionary variations of enzymes. Chisel allows automatic as well as interactive identification, and characterization of enzymatic sequences. Such knowledge can be utilized for comparative genomics, microbial diagnostics, metabolic engineering, drug design and analysis of metagenomes. RESULTS: Chisel is a comprehensive resource that contains 8575 clusters and subsequent computational models specific for 939 distinct enzymatic functions and, when data is sufficient, their taxonomic variations. Application of Chisel to identification of enzymatic sequences in newly sequenced genomes, analysis of organism-specific metabolic networks, 'binning' of metagenomes and other biological problems are presented. We also provide a thorough analysis of Chisel performance with other similar resources and manual annotations on Shewanella oneidensis MR1 genome.  相似文献   

4.
Composition and functions of microbial communities affect important traits in diverse hosts, from crops to humans. Yet, mechanistic understanding of how metabolism of individual microbes is affected by the community composition and metabolite leakage is lacking. Here, we first show that the consensus of automatically generated metabolic reconstructions improves the quality of the draft reconstructions, measured by comparison to reference models. We then devise an approach for gap filling, termed COMMIT, that considers metabolites for secretion based on their permeability and the composition of the community. By applying COMMIT with two soil communities from the Arabidopsis thaliana culture collection, we could significantly reduce the gap-filling solution in comparison to filling gaps in individual reconstructions without affecting the genomic support. Inspection of the metabolic interactions in the soil communities allows us to identify microbes with community roles of helpers and beneficiaries. Therefore, COMMIT offers a versatile fully automated solution for large-scale modelling of microbial communities for diverse biotechnological applications.  相似文献   

5.
6.
7.
Metabolomics experiments seldom achieve their aim of comprehensively covering the entire metabolome. However, important information can be gleaned even from sparse datasets, which can be facilitated by placing the results within the context of known metabolic networks. Here we present a method that allows the automatic assignment of identified metabolites to positions within known metabolic networks, and, furthermore, allows automated extraction of sub-networks of biological significance. This latter feature is possible by use of a gap-filling algorithm. The utility of the algorithm in reconstructing and mining of metabolomics data is shown on two independent datasets generated with LC–MS LTQ-Orbitrap mass spectrometry. Biologically relevant metabolic sub-networks were extracted from both datasets. Moreover, a number of metabolites, whose presence eluded automatic selection within mass spectra, could be identified retrospectively by virtue of their inferred presence through gap filling.  相似文献   

8.
Island systems have long been useful models for understanding lineage diversification in a geographic context, especially pertaining to the importance of dispersal in the origin of new clades. Here we use a well-resolved phylogeny of the flowering plant genus Cyrtandra (Gesneriaceae) from the Pacific Islands to compare four methods of inferring ancestral geographic ranges in islands: two developed for character-state reconstruction that allow only single-island ranges and do not explicitly associate speciation with range evolution (Fitch parsimony [FP; parsimony-based] and stochastic mapping [SM; likelihood-based]) and two methods developed specifically for ancestral range reconstruction, in which widespread ranges (spanning islands) are integral to inferences about speciation scenarios (dispersal-vicariance analysis [DIVA; parsimony-based] and dispersal-extinction-cladogenesis [DEC; likelihood-based]). The methods yield conflicting results, which we interpret in light of their respective assumptions. FP exhibits the least power to unequivocally reconstruct ranges, likely due to a combination of having flat (uninformative) transition costs and not using branch length information. SM reconstructions generally agree with a prior hypothesis about dispersal-driven speciation across the Pacific, despite the conceptual mismatch between its character-based model and this mode of range evolution. In contrast with narrow extant ranges for species of Cyrtandra, DIVA reconstructs broad ancestral ranges at many nodes. DIVA results also conflict with geological information on island ages; we attribute these conflicts to the parsimony criterion not considering branch lengths or time, as well as vicariance being the sole means of divergence for widespread ancestors. DEC analyses incorporated geological information on island ages and allowed prior hypotheses about range size and dispersal rates to be evaluated in a likelihood framework and gave more nuanced inferences about range evolution and the geography of speciation than other methods tested. However, ancestral ranges at several nodes could not be conclusively resolved, due possibly to uncertainty in the phylogeny or the relative complexity of the underlying model. Of the methods tested, SM and DEC both converge on plausible hypotheses for area range histories in Cyrtandra, due in part to the consideration of branch lengths and/or timing of events. We suggest that DEC model-based methods for ancestral range inference could be improved by adopting a Bayesian SM approach, in which stochastic sampling of complete geographic histories could be integrated over alternative phylogenetic topologies. Likelihood-based estimates of ancestral ranges for Cyrtandra suggest a major dispersal route into the Pacific through the islands of Fiji and Samoa, motivating future biogeographic investigation of this poorly known region.  相似文献   

9.

Background

Despite several recent advances in the automated generation of draft metabolic reconstructions, the manual curation of these networks to produce high quality genome-scale metabolic models remains a labour-intensive and challenging task.

Results

We present PathwayBooster, an open-source software tool to support the manual comparison and curation of metabolic models. It combines gene annotations from GenBank files and other sources with information retrieved from the metabolic databases BRENDA and KEGG to produce a set of pathway diagrams and reports summarising the evidence for the presence of a reaction in a given organism’s metabolic network. By comparing multiple sources of evidence within a common framework, PathwayBooster assists the curator in the identification of likely false positive (misannotated enzyme) and false negative (pathway hole) reactions. Reaction evidence may be taken from alternative annotations of the same genome and/or a set of closely related organisms.

Conclusions

By integrating and visualising evidence from multiple sources, PathwayBooster reduces the manual effort required in the curation of a metabolic model. The software is available online at http://www.theosysbio.bio.ic.ac.uk/resources/pathwaybooster/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0447-2) contains supplementary material, which is available to authorized users.  相似文献   

10.
MOTIVATION: A number of metabolic databases are available electronically, some with features for querying and visualizing metabolic pathways and regulatory networks. We present a unifying, systematic approach based on PETRI nets for storing, displaying, comparing, searching and simulating such nets from a number of different sources. RESULTS: Information from each data source is extracted and compiled into a PETRI net. Such PETRI nets then allow to investigate the (differential) content in metabolic databases, to map and integrate genomic information and functional annotations, to compare sequence and metabolic databases with respect to their functional annotations, and to define, generate and search paths and pathways in nets. We present an algorithm to systematically generate all pathways satisfying additional constraints in such PETRI nets. Finally, based on the set of valid pathways, so-called differential metabolic displays (DMDs) are introduced to exhibit specific differences between biological systems, i.e. different developmental states, disease states, or different organisms, on the level of paths and pathways. DMDs will be useful for target finding and function prediction, especially in the context of the interpretation of expression data.  相似文献   

11.
MOTIVATION: Beyond methods for a gene-wise annotation and analysis of sequenced genomes new automated methods for functional analysis on a higher level are needed. The identification of realized metabolic pathways provides valuable information on gene expression and regulation. Detection of incomplete pathways helps to improve a constantly evolving genome annotation or discover alternative biochemical pathways. To utilize automated genome analysis on the level of metabolic pathways new methods for the dynamic representation and visualization of pathways are needed. RESULTS: PathFinder is a tool for the dynamic visualization of metabolic pathways based on annotation data. Pathways are represented as directed acyclic graphs, graph layout algorithms accomplish the dynamic drawing and visualization of the metabolic maps. A more detailed analysis of the input data on the level of biochemical pathways helps to identify genes and detect improper parts of annotations. As an Relational Database Management System (RDBMS) based internet application PathFinder reads a list of EC-numbers or a given annotation in EMBL- or Genbank-format and dynamically generates pathway graphs.  相似文献   

12.
Constraint-based, genome-scale metabolic models are an essential tool to guide metabolic engineering. However, they lack the detail and time dimension that kinetic models with enzyme dynamics offer. Model reduction can be used to bridge the gap between the two methods and allow for the integration of kinetic models into the Design-Built-Test-Learn cycle. Here we show that these reduced size models can be representative of the dynamics of the original model and demonstrate the automated generation and parameterisation of such models. Using these minimal models of metabolism could allow for further exploration of dynamic responses in metabolic networks.  相似文献   

13.
14.
High-throughput data from various omics and sequencing techniques have rendered the automated metabolic network reconstruction a highly relevant problem. Our approach reflects the inherent probabilistic nature of the steps involved in metabolic network reconstruction. Here, the goal is to arrive at networks which combine probabilistic information with the possibility to obtain a small number of disconnected network constituents by reduction of a given preliminary probabilistic metabolic network. We define automated metabolic network reconstruction as an optimization problem on four-partite graph (nodes representing genes, enzymes, reactions, and metabolites) which integrates: (1) probabilistic information obtained from the existing process for metabolic reconstruction from a given genome, (2) connectedness of the raw metabolic network, and (3) clustering of components in the reconstructed metabolic network. The practical implications of our theoretical analysis refer to the quality of reconstructed metabolic networks and shed light on the problem of finding more efficient and effective methods for automated reconstruction. Our main contributions include: a completeness result for the defined problem, polynomial-time approximation algorithm, and an optimal polynomial-time algorithm for trees. Moreover, we exemplify our approach by the reconstruction of the sucrose biosynthesis pathway in Chlamydomonas reinhardtii.  相似文献   

15.
Hierarchical analysis of dependency in metabolic networks   总被引:7,自引:0,他引:7  
MOTIVATION: Elucidation of metabolic networks for an increasing number of organisms reveals that even small networks can contain thousands of reactions and chemical species. The intimate connectivity between components complicates their decomposition into biologically meaningful sub-networks. Moreover, traditional higher-order representations of metabolic networks as metabolic pathways, suffers from the lack of rigorous definition, yielding pathways of disparate content and size. RESULTS: We introduce a hierarchical representation that emphasizes the gross organization of metabolic networks in largely independent pathways and sub-systems at several levels of independence. The approach highlights the coupling of different pathways and the shared compounds responsible for those couplings. By assessing our results on Escherichia coli (E.coli metabolic reactions, Genetic Circuits Research Group, University of California, San Diego, http://gcrg.ucsd.edu/organisms/ecoli.html, 'model v 1.01. reactions') against accepted biochemical annotations, we provide the first systematic synopsis of an organism's metabolism. Comparison with operons of E.coli shows that low-level clusters are reflected in genome organization and gene regulation. AVAILABILITY: Source code, data sets and supplementary information are available at http://www.mas.ecp.fr/labo/equipe/gagneur/hierarchy/hierarchy.html  相似文献   

16.
Significant advances in system-level modeling of cellular behavior can be achieved based on constraints derived from genomic information and on optimality hypotheses. For steady-state models of metabolic networks, mass conservation and reaction stoichiometry impose linear constraints on metabolic fluxes. Different objectives, such as maximization of growth rate or minimization of flux distance from a reference state, can be tested in different organisms and conditions. In particular, we have suggested that the metabolic properties of mutant bacterial strains are best described by an algorithm that performs a minimization of metabolic adjustment (MOMA) upon gene deletion. The increasing availability of many annotated genomes paves the way for a systematic application of these flux balance methods to a large variety of organisms. However, such a high throughput goal crucially depends on our capacity to build metabolic flux models in a fully automated fashion. Here we describe a pipeline for generating models from annotated genomes and discuss the current obstacles to full automation. In addition, we propose a framework for the integration of flux modeling results and high throughput proteomic data, which can potentially help in the inference of whole-cell kinetic parameters.  相似文献   

17.
Genome-scale metabolic models are central in connecting genotypes to metabolic phenotypes. However, even for well studied organisms, such as Escherichia coli, draft networks do not contain a complete biochemical network. Missing reactions are referred to as gaps. These gaps need to be filled to enable functional analysis, and gap-filling choices influence model predictions. To investigate whether functional networks existed where all gap-filling reactions were supported by sequence similarity to annotated enzymes, four draft networks were supplemented with all reactions from the Model SEED database for which minimal sequence similarity was found in their genomes. Quadratic programming revealed that the number of reactions that could partake in a gap-filling solution was vast: 3,270 in the case of E. coli, where 72% of the metabolites in the draft network could connect a gap-filling solution. Nonetheless, no network could be completed without the inclusion of orphaned enzymes, suggesting that parts of the biochemistry integral to biomass precursor formation are uncharacterized. However, many gap-filling reactions were well determined, and the resulting networks showed improved prediction of gene essentiality compared with networks generated through canonical gap filling. In addition, gene essentiality predictions that were sensitive to poorly determined gap-filling reactions were of poor quality, suggesting that damage to the network structure resulting from the inclusion of erroneous gap-filling reactions may be predictable.  相似文献   

18.

Background

Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data.

Results

In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO) concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general.

Conclusion

GEANN is useful for two distinct purposes: (i) automating the annotation of genomic entities with Gene Ontology concepts, and (ii) providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate pattern occurrences with similar semantics. Relatively low recall performance of our pattern-based approach may be enhanced either by employing a probabilistic annotation framework based on the annotation neighbourhoods in textual data, or, alternatively, the statistical enrichment threshold may be adjusted to lower values for applications that put more value on achieving higher recall values.  相似文献   

19.
Parker CB  Delong ER 《Biometrics》2000,56(4):996-1001
Changes in maximum likelihood parameter estimates due to deletion of individual observations are useful statistics, both for regression diagnostics and for computing robust estimates of covariance. For many likelihoods, including those in the exponential family, these delete-one statistics can be approximated analytically from a one-step Newton-Raphson iteration on the full maximum likelihood solution. But for general conditional likelihoods and the related Cox partial likelihood, the one-step method does not reduce to an analytic solution. For these likelihoods, an alternative analytic approximation that relies on an appropriately augmented design matrix has been proposed. In this paper, we extend the augmentation approach to explicitly deal with discrete failure-time models. In these models, an individual subject may contribute information at several time points, thereby appearing in multiple risk sets before eventually experiencing a failure or being censored. Our extension also allows the covariates to be time dependent. The new augmentation requires no additional computational resources while improving results.  相似文献   

20.

Background  

Incorrectly annotated sequence data are becoming more commonplace as databases increasingly rely on automated techniques for annotation. Hence, there is an urgent need for computational methods for checking consistency of such annotations against independent sources of evidence and detecting potential annotation errors. We show how a machine learning approach designed to automatically predict a protein's Gene Ontology (GO) functional class can be employed to identify potential gene annotation errors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号