共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Background
Inferring gene regulatory networks from data requires the development of algorithms devoted to structure extraction. When only static data are available, gene interactions may be modelled by a Bayesian Network (BN) that represents the presence of direct interactions from regulators to regulees by conditional probability distributions. We used enhanced evolutionary algorithms to stochastically evolve a set of candidate BN structures and found the model that best fits data without prior knowledge. 相似文献3.
A Crombach KR Wotton D Cicin-Sain M Ashyraliyev J Jaeger 《PLoS computational biology》2012,8(7):e1002589
Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to discover whether there are rules or regularities governing development and evolution of complex multi-cellular organisms. 相似文献
4.
Motivation: Pair-wise residue-residue contacts in proteins canbe predicted from both threading templates and sequence-basedmachine learning. However, most structure modeling approachesonly use the template-based contact predictions in guiding thesimulations; this is partly because the sequence-based contactpredictions are usually considered to be less accurate thanthat by threading. With the rapid progress in sequence databasesand machine-learning techniques, it is necessary to have a detailedand comprehensive assessment of the contact-prediction methodsin different template conditions. Results: We develop two methods for protein-contact predictions:SVM-SEQ is a sequence-based machine learning approach whichtrains a variety of sequence-derived features on contact maps;SVM-LOMETS collects consensus contact predictions from multiplethreading templates. We test both methods on the same set of554 proteins which are categorized into Easy,Medium, Hard and Very Hardtargets based on the evolutionary and structural distance betweentemplates and targets. For the Easy and Medium targets, SVM-LOMETSobviously outperforms SVM-SEQ; but for the Hard and Very Hardtargets, the accuracy of the SVM-SEQ predictions is higher thanthat of SVM-LOMETS by 12–25%. If we combine the SVM-SEQand SVM-LOMETS predictions together, the total number of correctlypredicted contacts in the Hard proteins will increase by morethan 60% (or 70% for the long-range contact with a sequenceseparation 24), compared with SVM-LOMETS alone. The advantageof SVM-SEQ is also shown in the CASP7 free modeling targetswhere the SVM-SEQ is around four times more accurate than SVM-LOMETSin the long-range contact prediction. These data demonstratethat the state-of-the-art sequence-based contact predictionhas reached a level which may be helpful in assisting tertiarystructure modeling for the targets which do not have close structuretemplates. The maximum yield should be obtained by the combinationof both sequence- and template-based predictions. Contact: yzhang{at}ku.edu Supplementary information: Supplementary data are availableat Bioinformatics online.
Associate Editor: Anna Tramontano 相似文献
5.
Jongrae Kim Declan G Bates Ian Postlethwaite Pat Heslop-Harrison Kwang-Hyun Cho 《BMC bioinformatics》2007,8(1):8
Background
We consider the problem of identifying the dynamic interactions in biochemical networks from noisy experimental data. Typically, approaches for solving this problem make use of an estimation algorithm such as the well-known linear Least-Squares (LS) estimation technique. We demonstrate that when time-series measurements are corrupted by white noise and/or drift noise, more accurate and reliable identification of network interactions can be achieved by employing an estimation algorithm known as Constrained Total Least Squares (CTLS). The Total Least Squares (TLS) technique is a generalised least squares method to solve an overdetermined set of equations whose coefficients are noisy. The CTLS is a natural extension of TLS to the case where the noise components of the coefficients are correlated, as is usually the case with time-series measurements of concentrations and expression profiles in gene networks. 相似文献6.
7.
Karin Radrich Yoshimasa Tsuruoka Paul Dobson Albert Gevorgyan Neil Swainston Gino Baart Jean-Marc Schwartz 《BMC systems biology》2010,4(1):114
Background
Genome-scale metabolic reconstructions have been recognised as a valuable tool for a variety of applications ranging from metabolic engineering to evolutionary studies. However, the reconstruction of such networks remains an arduous process requiring a high level of human intervention. This process is further complicated by occurrences of missing or conflicting information and the absence of common annotation standards between different data sources. 相似文献8.
EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference 总被引:1,自引:0,他引:1
EFICAz (Enzyme Function Inference by Combined Approach) is an automatic engine for large-scale enzyme function inference that combines predictions from four different methods developed and optimized to achieve high prediction accuracy: (i) recognition of functionally discriminating residues (FDRs) in enzyme families obtained by a Conservation-controlled HMM Iterative procedure for Enzyme Family classification (CHIEFc), (ii) pairwise sequence comparison using a family specific Sequence Identity Threshold, (iii) recognition of FDRs in Multiple Pfam enzyme families, and (iv) recognition of multiple Prosite patterns of high specificity. For FDR (i.e. conserved positions in an enzyme family that discriminate between true and false members of the family) identification, we have developed an Evolutionary Footprinting method that uses evolutionary information from homofunctional and heterofunctional multiple sequence alignments associated with an enzyme family. The FDRs show a significant correlation with annotated active site residues. In a jackknife test, EFICAz shows high accuracy (92%) and sensitivity (82%) for predicting four EC digits in testing sequences that are <40% identical to any member of the corresponding training set. Applied to Escherichia coli genome, EFICAz assigns more detailed enzymatic function than KEGG, and generates numerous novel predictions. 相似文献
9.
10.
A comprehensive approach for drug safety assessment 总被引:1,自引:0,他引:1
Li AP 《Chemico-biological interactions》2004,150(1):27-33
A comprehensive, multidisciplinary approach is proposed here for the development of a drug with an acceptable safety profile. Key parameters to be considered for drug safety evaluation based on this comprehensive approach include the following: (1) Pharmacology: Possible toxicity due to drug-target interactions, including interactions with unintended molecular targets, or with molecular targets in unintended organs. (2) Chemistry: Chemical scaffolding and side-chains with safety concerns. (3) Toxicology: Toxicity in animals in vivo, and in relevant animal and human cells in culture. (4) Drug metabolism and pharmacokinetics: Safety concerns due to toxification or detoxification, organ distribution, clearance and pharmacokinetic drug-drug interactions. (5) Risk factors: Physiological, environmental and genetic factors that may enhance a patient's susceptibility. It is proposed that this integrated, multidisciplinary approach to safety evaluation may enhance the accuracy of the prediction of drug safety and thereby the efficiency of drug development. 相似文献
11.
Background
Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. 相似文献12.
13.
The integration of various types of genomic data into predictive models of biological networks is one of the main challenges currently faced by computational biology. Constraint-based models in particular play a key role in the attempt to obtain a quantitative understanding of cellular metabolism at genome scale. In essence, their goal is to frame the metabolic capabilities of an organism based on minimal assumptions that describe the steady states of the underlying reaction network via suitable stoichiometric constraints, specifically mass balance and energy balance (i.e. thermodynamic feasibility). The implementation of these requirements to generate viable configurations of reaction fluxes and/or to test given flux profiles for thermodynamic feasibility can however prove to be computationally intensive. We propose here a fast and scalable stoichiometry-based method to explore the Gibbs energy landscape of a biochemical network at steady state. The method is applied to the problem of reconstructing the Gibbs energy landscape underlying metabolic activity in the human red blood cell, and to that of identifying and removing thermodynamically infeasible reaction cycles in the Escherichia coli metabolic network (iAF1260). In the former case, we produce consistent predictions for chemical potentials (or log-concentrations) of intracellular metabolites; in the latter, we identify a restricted set of loops (23 in total) in the periplasmic and cytoplasmic core as the origin of thermodynamic infeasibility in a large sample (10(6)) of flux configurations generated randomly and compatibly with the prior information available on reaction reversibility. 相似文献
14.
Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. 相似文献
15.
16.
17.
Rzhetsky A Koike T Kalachikov S Gomez SM Krauthammer M Kaplan SH Kra P Russo JJ Friedman C 《Bioinformatics (Oxford, England)》2000,16(12):1120-1128
MOTIVATION: In order to aid in hypothesis-driven experimental gene discovery, we are designing a computer application for the automatic retrieval of signal transduction data from electronic versions of scientific publications using natural language processing (NLP) techniques, as well as for visualizing and editing representations of regulatory systems. These systems describe both signal transduction and biochemical pathways within complex multicellular organisms, yeast, and bacteria. This computer application in turn requires the development of a domain-specific ontology, or knowledge model. RESULTS: We introduce an ontological model for the representation of biological knowledge related to regulatory networks in vertebrates. We outline a taxonomy of the concepts, define their 'whole-to-part' relationships, describe the properties of major concepts, and outline a set of the most important axioms. The ontology is partially realized in a computer system designed to aid researchers in biology and medicine in visualizing and editing a representation of a signal transduction system. 相似文献
18.
19.
Nitrogen (N) is an important nutrient and signal for plant growth and development. However, to date, our knowledge of how plants sense and transduce the N signals is very limited. To better understand the molecular mechanisms of plant N responses, we took two-dimensional gel-based proteomic and phosphoproteomic approaches to profile the proteins with abundance and phosphorylation state changes during nitrate deprivation and recovery in the model plant Arabidopsis thaliana. After 7-day-old seedlings were N-deprived for up to 48 h followed by 24 h recovery, a total of 170 and 38 proteins were identified with significant changes in abundance and phosphorylation state, respectively. Bioinformatic analyses implicate these proteins in diverse cellular processes including N and protein metabolisms, photosynthesis, cytoskeleton, redox homeostasis, and signal transduction. Functional studies of the selected nitrate-responsive proteins indicate that the proteasome regulatory subunit RPT5a and the cytoskeleton protein Tubulin alpha-6 (TUA6) play important roles in plant nitrate responses by regulating plant N use efficiency (NUE) and low nitrate-induced anthocyanin biosynthesis, respectively. In conclusion, our study provides novel insights into plant responses to nitrate at the proteome level, which are expected to be highly useful for dissecting the N response pathways in higher plants and for improving plant NUE. 相似文献
20.
RRW: repeated random walks on genome-scale protein networks for local cluster discovery 总被引:1,自引:0,他引:1