共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request. 相似文献
4.
5.
Gene network inference from incomplete expression data: transcriptional control of hematopoietic commitment 总被引:2,自引:0,他引:2
MOTIVATION: The topology and function of gene regulation networks are commonly inferred from time series of gene expression levels in cell populations. This strategy is usually invalid if the gene expression in different cells of the population is not synchronous. A promising, though technically more demanding alternative is therefore to measure the gene expression levels in single cells individually. The inference of a gene regulation network requires knowledge of the gene expression levels at successive time points, at least before and after a network transition. However, owing to experimental limitations a complete determination of the precursor state is not possible. RESULTS: We investigate a strategy for the inference of gene regulatory networks from incomplete expression data based on dynamic Bayesian networks. This permits prediction of the number of experiments necessary for network inference depending on parameters including noise in the data, prior knowledge and limited attainability of initial states. Our strategy combines a gradual 'Partial Learning' approach based solely on true experimental observations for the network topology with expectation maximization for the network parameters. We illustrate our strategy by extensive computer simulations in a high-dimensional parameter space in a simulated single-cell-based example of hematopoietic stem cell commitment and in random networks of different sizes. We find that the feasibility of network inferences increases significantly with the experimental ability to force the system into different initial network states, with prior knowledge and with noise reduction. AVAILABILITY: Source code is available under: www.izbi.uni-leipzig.de/services/NetwPartLearn.html SUPPLEMENTARY INFORMATION: Supplementary Data are available at Bioinformatics online. 相似文献
6.
Background
All currently available methods of network/association inference from microarray gene expression measurements implicitly assume that such measurements represent the actual expression levels of different genes within each cell included in the biological sample under study. Contrary to this common belief, modern microarray technology produces signals aggregated over a random number of individual cells, a "nitty-gritty" aspect of such arrays, thereby causing a random effect that distorts the correlation structure of intra-cellular gene expression levels. 相似文献7.
8.
Bolan Linghu Evan S Snitkin Dustin T Holloway Adam M Gustafson Yu Xia Charles DeLisi 《BMC bioinformatics》2008,9(1):119
Background
Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation. 相似文献9.
A Bayesian regression approach to the inference of regulatory networks from gene expression data 总被引:3,自引:0,他引:3
MOTIVATION: There is currently much interest in reverse-engineering regulatory relationships between genes from microarray expression data. We propose a new algorithmic method for inferring such interactions between genes using data from gene knockout experiments. The algorithm we use is the Sparse Bayesian regression algorithm of Tipping and Faul. This method is highly suited to this problem as it does not require the data to be discretized, overcomes the need for an explicit topology search and, most importantly, requires no heuristic thresholding of the discovered connections. RESULTS: Using simulated expression data, we are able to show that this algorithm outperforms a recently published correlation-based approach. Crucially, it does this without the need to set any ad hoc threshold on possible connections. 相似文献
10.
MOTIVATION: The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes. RESULTS: We propose to rely on the biological process domain of the gene ontology to construct a gene semantic similarity network and then use the network to infer disease genes. We show that the constructed network covers about 50% more genes than a typical PPI network. By analyzing the gene semantic similarity network with the PPI network, we show that gene pairs tend to have higher semantic similarity scores if the corresponding proteins are closer to each other in the PPI network. By analyzing the gene semantic similarity network with a phenotype similarity network, we show that semantic similarity scores of genes associated with similar diseases are significantly different from those of genes selected at random, and that genes with higher semantic similarity scores tend to be associated with diseases with higher phenotype similarity scores. We further use the gene semantic similarity network with a random walk with restart model to infer disease genes. Through a series of large-scale leave-one-out cross-validation experiments, we show that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes. 相似文献
11.
12.
13.
14.
MicroRNAs (miRNAs) regulate a large proportion of mammalian genes by hybridizing to targeted messenger RNAs (mRNAs) and down-regulating their translation into protein. Although much work has been done in the genome-wide computational prediction of miRNA genes and their target mRNAs, an open question is how to efficiently obtain functional miRNA targets from a large number of candidate miRNA targets predicted by existing computational algorithms. In this paper, we propose a novel Bayesian model and learning algorithm, GenMiR++ (Generative model for miRNA regulation), that accounts for patterns of gene expression using miRNA expression data and a set of candidate miRNA targets. A set of high-confidence functional miRNA targets are then obtained from the data using a Bayesian learning algorithm. Our model scores 467 high-confidence miRNA targets out of 1,770 targets obtained from TargetScanS in mouse at a false detection rate of 2.5%: several confirmed miRNA targets appear in our high-confidence set, such as the interactions between miR-92 and the signal transduction gene MAP2K4, as well as the relationship between miR-16 and BCL2, an anti-apoptotic gene which has been implicated in chronic lymphocytic leukemia. We present results on the robustness of our model showing that our learning algorithm is not sensitive to various perturbations of the data. Our high-confidence targets represent a significant increase in the number of miRNA targets and represent a starting point for a global understanding of gene regulation. 相似文献
15.
16.
Community structure and diversity of tropical forest mammals: data from a global camera trap network
Ahumada JA Silva CE Gajapersad K Hallam C Hurtado J Martin E McWilliam A Mugerwa B O'Brien T Rovero F Sheil D Spironello WR Winarni N Andelman SJ 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2011,366(1578):2703-2711
Terrestrial mammals are a key component of tropical forest communities as indicators of ecosystem health and providers of important ecosystem services. However, there is little quantitative information about how they change with local, regional and global threats. In this paper, the first standardized pantropical forest terrestrial mammal community study, we examine several aspects of terrestrial mammal species and community diversity (species richness, species diversity, evenness, dominance, functional diversity and community structure) at seven sites around the globe using a single standardized camera trapping methodology approach. The sites-located in Uganda, Tanzania, Indonesia, Lao PDR, Suriname, Brazil and Costa Rica-are surrounded by different landscape configurations, from continuous forests to highly fragmented forests. We obtained more than 51 000 images and detected 105 species of mammals with a total sampling effort of 12 687 camera trap days. We find that mammal communities from highly fragmented sites have lower species richness, species diversity, functional diversity and higher dominance when compared with sites in partially fragmented and continuous forest. We emphasize the importance of standardized camera trapping approaches for obtaining baselines for monitoring forest mammal communities so as to adequately understand the effect of global, regional and local threats and appropriately inform conservation actions. 相似文献
17.
This article presents a new modeling strategy in functional data analysis. We consider the problem of estimating an unknown smooth function given functional data with noise. The unknown function is treated as the realization of a stochastic process, which is incorporated into a diffusion model. The method of smoothing spline estimation is connected to a special case of this approach. The resulting models offer great flexibility to capture the dynamic features of functional data, and allow straightforward and meaningful interpretation. The likelihood of the models is derived with Euler approximation and data augmentation. A unified Bayesian inference method is carried out via a Markov chain Monte Carlo algorithm including a simulation smoother. The proposed models and methods are illustrated on some prostate-specific antigen data, where we also show how the models can be used for forecasting. 相似文献
18.
19.
20.
Constraint-based functional similarity of metabolic genes: going beyond network topology 总被引:1,自引:0,他引:1
Rokhlenko O Shlomi T Sharan R Ruppin E Pinter RY 《Bioinformatics (Oxford, England)》2007,23(16):2139-2146