首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Jung S  Lee KH  Lee D 《Bio Systems》2007,90(1):197-210
The Bayesian network is a popular tool for describing relationships between data entities by representing probabilistic (in)dependencies with a directed acyclic graph (DAG) structure. Relationships have been inferred between biological entities using the Bayesian network model with high-throughput data from biological systems in diverse fields. However, the scalability of those approaches is seriously restricted because of the huge search space for finding an optimal DAG structure in the process of Bayesian network learning. For this reason, most previous approaches limit the number of target entities or use additional knowledge to restrict the search space. In this paper, we use the hierarchical clustering and order restriction (H-CORE) method for the learning of large Bayesian networks by clustering entities and restricting edge directions between those clusters, with the aim of overcoming the scalability problem and thus making it possible to perform genome-scale Bayesian network analysis without additional biological knowledge. We use simulations to show that H-CORE is much faster than the widely used sparse candidate method, whilst being of comparable quality. We have also applied H-CORE to retrieving gene-to-gene relationships in a biological system (The 'Rosetta compendium'). By evaluating learned information through literature mining, we demonstrate that H-CORE enables the genome-scale Bayesian analysis of biological systems without any prior knowledge.  相似文献   

2.
Contact-tracing data (CTD) collected from disease outbreaks has received relatively little attention in the epidemic modeling literature because it is thought to be unreliable: infection sources might be wrongly attributed, or data might be missing due to resource constraints in the questionnaire exercise. Nevertheless, these data might provide a rich source of information on the disease transmission rate. This paper presents a novel methodology for combining CTD with rate-based contact network data to improve posterior precision, and therefore predictive accuracy. We present an advancement in Bayesian inference for epidemics that assimilates these data and is robust to partial contact tracing. Using a simulation study based on the British poultry industry, we show how the presence of CTD improves posterior predictive accuracy and can directly inform a more effective control strategy.  相似文献   

3.
In this paper, we apply the entitymetrics model to our constructed Gene-Citation-Gene (GCG) network. Based on the premise there is a hidden, but plausible, relationship between an entity in one article and an entity in its citing article, we constructed a GCG network of gene pairs implicitly connected through citation. We compare the performance of this GCG network to a gene-gene (GG) network constructed over the same corpus but which uses gene pairs explicitly connected through traditional co-occurrence. Using 331,411 MEDLINE abstracts collected from 18,323 seed articles and their references, we identify 25 gene pairs. A comparison of these pairs with interactions found in BioGRID reveal that 96% of the gene pairs in the GCG network have known interactions. We measure network performance using degree, weighted degree, closeness, betweenness centrality and PageRank. Combining all measures, we find the GCG network has more gene pairs, but a lower matching rate than the GG network. However, combining top ranked genes in both networks produces a matching rate of 35.53%. By visualizing both the GG and GCG networks, we find that cancer is the most dominant disease associated with the genes in both networks. Overall, the study indicates that the GCG network can be useful for detecting gene interaction in an implicit manner.  相似文献   

4.
5.
Exposure to chemicals in the environment is believed to play a critical role in the etiology of many human diseases. To enhance understanding about environmental effects on human health, the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) provides unique curated data that enable development of novel hypotheses about the relationships between chemicals and diseases. CTD biocurators read the literature and curate direct relationships between chemicals-genes, genes-diseases, and chemicals-diseases. These direct relationships are then computationally integrated to create additional inferred relationships; for example, a direct chemical-gene statement can be combined with a direct gene-disease statement to generate a chemical-disease inference (inferred via the shared gene). In CTD, the number of inferences has increased exponentially as the number of direct chemical, gene and disease interactions has grown. To help users navigate and prioritize these inferences for hypothesis development, we implemented a statistic to score and rank them based on the topology of the local network consisting of the chemical, disease and each of the genes used to make an inference. In this network, chemicals, diseases and genes are nodes connected by edges representing the curated interactions. Like other biological networks, node connectivity is an important consideration when evaluating the CTD network, as the connectivity of nodes follows the power-law distribution. Topological methods reduce the influence of highly connected nodes that are present in biological networks. We evaluated published methods that used local network topology to determine the reliability of protein–protein interactions derived from high-throughput assays. We developed a new metric that combines and weights two of these methods and uniquely takes into account the number of common neighbors and the connectivity of each entity involved. We present several CTD inferences as case studies to demonstrate the value of this metric and the biological relevance of the inferences.  相似文献   

6.
The C-terminal domain (CTD) of the largest subunit in DNA-dependent RNA polymerase II (RNAP II) is essential for mRNA synthesis and processing, through coordination of an astounding array of protein-protein interactions. Not surprisingly, CTD mutations can have complex, pleiotropic impacts on phenotype. For example, insertions of five alanine residues between CTD diheptads in yeast, which alter the CTD''s overall tandem structure and physically separate core functional units, dramatically reduce growth rate and result in abnormally large cells that accumulate increased DNA content over time. Patterns by which specific CTD-protein interactions are disrupted by changes in CTD structure, as well as how downstream metabolic pathways are impacted, are difficult to target for direct experimental analyses. In an effort to connect an altered CTD to complex but quantifiable phenotypic changes, we applied network analyses of genes that are differentially expressed in our five alanine CTD mutant, combined with established genetic interactions from the Saccharomyces cerevisiae Genome Database (SGD). We were able to identify candidate genetic pathways, and several key genes, that could explain how this change in CTD structure leads to the specific phenotypic changes observed. These hypothetical networks identify links between CTD-associated proteins and mitotic function, control of cell cycle checkpoint mechanisms, and expression of cell wall and membrane components. Such results can help to direct future genetic and biochemical investigations that tie together the complex impacts of the CTD on global cellular metabolism.  相似文献   

7.
8.
Bell L  Chowdhary R  Liu JS  Niu X  Zhang J 《PloS one》2011,6(6):e21474
A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein-protein interactions, protein/gene regulations, protein-small molecule interactions, protein-GO relationships, protein-pathway relationships, and pathway-disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses--the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.  相似文献   

9.
10.
11.
12.
Extraction of biological interaction networks from scientific literature   总被引:2,自引:0,他引:2  
Biology can be regarded as a science of networks: interactions between various biological entities (eg genes, proteins, metabolites) on different levels (eg gene regulation, cell signalling) can be represented as graphs and, thus, analysis of such networks might shed new light on the function of biological systems. Such biological networks can be obtained from different sources. The extraction of networks from text is an important technique that requires the integration of several different computational disciplines. This paper summarises the most important steps in network extraction and reviews common approaches and solutions for the extraction of biological networks from scientific literature.  相似文献   

13.
The type IX secretion system (T9SS) of Porphyromonas gingivalis secretes proteins possessing a conserved C-terminal domain (CTD) to the cell surface. The C-terminal signal is essential for these proteins to translocate across the outer membrane via the T9SS. On the surface the CTD of these proteins is cleaved prior to extensive glycosylation. It is believed that the modification on these CTD proteins is anionic lipopolysaccharide (A-LPS), which enables the attachment of CTD proteins to the cell surface. However, the exact site of modification and the mechanism of attachment of CTD proteins to the cell surface are unknown. In this study we characterized two wbaP (PG1964) mutants that did not synthesise A-LPS and accumulated CTD proteins in the clarified culture fluid (CCF). The CTDs of the CTD proteins in the CCF were cleaved suggesting normal secretion, however, the CTD proteins were not glycosylated. Mass spectrometric analysis of CTD proteins purified from the CCF of the wbaP mutants revealed the presence of various peptide/amino acid modifications from the growth medium at the C-terminus of the mature CTD proteins. This suggested that modification occurs at the C-terminus of T9SS substrates in the wild type P. gingivalis. This was confirmed by analysis of CTD proteins from wild type, where a 648 Da linker was identified to be attached at the C-terminus of mature CTD proteins. Importantly, treatment with proteinase K released the 648 Da linker from the CTD proteins demonstrating a peptide bond between the C-terminus and the modification. Together, this is suggestive of a mechanism similar to sortase A for the cleavage and modification/attachment of CTD proteins in P. gingivalis. PG0026 has been recognized as the CTD signal peptidase and is now proposed to be the sortase-like protein in P. gingivalis. To our knowledge, this is the first biochemical evidence suggesting a sortase-like mechanism in Gram-negative bacteria.  相似文献   

14.
The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency.  相似文献   

15.
Methods for analyzing the amino-acid sequence of a protein for the purposes of predicting its three-dimensional structure were systematically analyzed using knowledge engineering techniques. The resulting entities (data) and relations (processing methods and constraints) have been represented within a generalized dependency network consisting of 29 nodes and over 100 links. It is argued that such a representation meets the requirements of knowledge-based systems in molecular biology. This network is used as the architecture for a prototype knowledge-based system that simulates logically the processes used in protein structure prediction. Although developed specifically for applications in protein structure prediction, the network architecture provides a strategy for tackling the general problem of orchestrating and integrating the diverse sources of knowledge that are characteristic of many areas of science.  相似文献   

16.
DNA MMR (mismatch repair) is an excision repair system that removes mismatched bases generated primarily by failure of the 3'-5' proofreading activity associated with replicative DNA polymerases. MutL proteins homologous to human PMS2 are the endonucleases that introduce the entry point of the excision reaction. Deficiency in PMS2 function is one of the major etiologies of hereditary non-polyposis colorectal cancers in humans. Although recent studies revealed that the CTD (C-terminal domain) of MutL harbours weak endonuclease activity, the regulatory mechanism of this activity remains unknown. In this paper, we characterize in detail the CTD and NTD (N-terminal domain) of aqMutL (Aquifex aeolicus MutL). On the one hand, CTD existed as a dimer in solution and showed weak DNA-binding and Mn2+-dependent endonuclease activities. On the other hand, NTD was monomeric and exhibited a relatively strong DNA-binding activity. It was also clarified that NTD promotes the endonuclease activity of CTD. NTD-mediated activation of CTD was abolished by depletion of the zinc-ion from the reaction mixture or by the substitution of the zinc-binding cysteine residue in CTD with an alanine. On the basis of these results, we propose a model for the intramolecular regulatory mechanism of MutL endonuclease activity.  相似文献   

17.
Frame shift mutations of the polyglutamine binding protein-1 (PQBP1) gene lead to total or partial truncation of the C-terminal domain (CTD) and cause mental retardation in human patients. Interestingly, normal Drosophila homologue of PQBP-1 lacks CTD. As a model to analyze the molecular network of PQBP-1 affecting intelligence, we generated transgenic flies expressing human PQBP-1 with CTD. Pavlovian olfactory conditioning revealed that the transgenic flies showed disturbance of long-term memory. In addition, they showed abnormal courtship that male flies follow male flies. Abnormal functions of PQBP-1 or its binding partner might be linked to these symptoms.  相似文献   

18.
19.
Yvain Nicolet  Cécile Tron 《FEBS letters》2010,584(19):4197-4202
HydG uses tyrosine to synthesize the CN/CO ligands of [FeFe]-hydrogenase active site. We have mutated two of the [4Fe-4S]-cluster cysteine ligands of the HydG C-terminal domain (CTD) to serine. The double mutant can still synthesize CN but not CO. In a mutant lacking the CTD both CN and CO synthesis are abolished. Like in ThiH, the initial steps of CN synthesis are carried out in the TIM-barrel domain of HydG but some component(s) of the CTD are later needed. The mutants indicate that CO synthesis is metal-based and occurs in the CTD. We postulate that CN/CO synthesis is initiated by H2N-CH-. Fragmentation of this radical into H2N-CH2 and CO2 or H2CNH and provides plausible precursors for CN/CO synthesis.  相似文献   

20.
ABSTRACT: BACKGROUND: The representation of a biochemical system as a network is the precursor of any mathematical model of the processes driving the dynamics of that system. Pharmacokinetics uses mathematical models to describe the interactions between drug, and drug metabolites and targets and through the simulation of these models predicts drug levels and/or dynamic behaviors of drug entities in the body. Therefore, the development of computational techniques for inferring the interaction network of the drug entities and its kinetic parameters from observational data is raising great interest in the scientic community of pharmacologists. In fact, the network inference is a set of mathematical procedures deducing the structure of a model from the experimental data associated to the nodes of the network of interactions. In this paper, we deal with the inference of a pharmacokinetic network from the concentrations of the drug and its metabolites observed at discrete time points. RESULTS: The method of network inference presented in this paper is inspired by the theory of time-lagged correlation inference with regard to the deduction of the interaction network, and on a maximum likelihood approach with regard to the estimation of the kinetic parameters of the network. Both network inference and parameter estimation have been designed specically to identify systems of biotransformations, at the biochemical level, from noisy time-resolved experimental data. We use our inference method to deduce the metabolic pathway of the gemcitabine. The inputs to our inference algorithm are the experimental time series of the concentration of gemcitabine and its metabolites. The output is the set of reactions of the metabolic network of the gemcitabine. CONCLUSIONS: Time-lagged correlation based inference pairs up to a probabilistic model of parameter inference from metabolites time series allows the identication of the microscopic pharmacokinetics and pharmacodynamics of a drug with a minimal a priori knowledge. In fact, the inference model presented in this paper is completely unsupervised. It takes as input the time series of the concetrations of the parent drug and its metabolites. The method, applied to the case study of the gemcitabine pharmacokinetics, shows good accuracy and sensitivity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号