首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Bayesian methods are widely used in many different areas of research. Recently, it has become a very popular tool for biological network reconstruction, due to its ability to handle noisy data. Even though there are many software packages allowing for Bayesian network reconstruction, only few of them are freely available to researchers. Moreover, they usually require at least basic programming abilities, which restricts their potential user base. Our goal was to provide software which would be freely available, efficient and usable to non-programmers. RESULTS: We present a BNFinder software, which allows for Bayesian network reconstruction from experimental data. It supports dynamic Bayesian networks and, if the variables are partially ordered, also static Bayesian networks. The main advantage of BNFinder is the use exact algorithm, which is at the same time very efficient (polynomial with respect to the number of observations).  相似文献   

2.
Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes. These methods partition big data sets into subsets and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications and will assist future progress in this rapidly developing field.  相似文献   

3.
Alterovitz G  Liu J  Afkhami E  Ramoni MF 《Proteomics》2007,7(16):2843-2855
Biological and medical data have been growing exponentially over the past several years [1, 2]. In particular, proteomics has seen automation dramatically change the rate at which data are generated [3]. Analysis that systemically incorporates prior information is becoming essential to making inferences about the myriad, complex data [4-6]. A Bayesian approach can help capture such information and incorporate it seamlessly through a rigorous, probabilistic framework. This paper starts with a review of the background mathematics behind the Bayesian methodology: from parameter estimation to Bayesian networks. The article then goes on to discuss how emerging Bayesian approaches have already been successfully applied to research across proteomics, a field for which Bayesian methods are particularly well suited [7-9]. After reviewing the literature on the subject of Bayesian methods in biological contexts, the article discusses some of the recent applications in proteomics and emerging directions in the field.  相似文献   

4.
High‐dimensional data provide many potential confounders that may bolster the plausibility of the ignorability assumption in causal inference problems. Propensity score methods are powerful causal inference tools, which are popular in health care research and are particularly useful for high‐dimensional data. Recent interest has surrounded a Bayesian treatment of propensity scores in order to flexibly model the treatment assignment mechanism and summarize posterior quantities while incorporating variance from the treatment model. We discuss methods for Bayesian propensity score analysis of binary treatments, focusing on modern methods for high‐dimensional Bayesian regression and the propagation of uncertainty. We introduce a novel and simple estimator for the average treatment effect that capitalizes on conjugacy of the beta and binomial distributions. Through simulations, we show the utility of horseshoe priors and Bayesian additive regression trees paired with our new estimator, while demonstrating the importance of including variance from the treatment regression model. An application to cardiac stent data with almost 500 confounders and 9000 patients illustrates approaches and facilitates comparison with existing alternatives. As measured by a falsifiability endpoint, we improved confounder adjustment compared with past observational research of the same problem.  相似文献   

5.
Yi N  Shriner D 《Heredity》2008,100(3):240-252
Many complex human diseases and traits of biological and/or economic importance are determined by interacting networks of multiple quantitative trait loci (QTL) and environmental factors. Mapping QTL is critical for understanding the genetic basis of complex traits, and for ultimate identification of genes responsible. A variety of sophisticated statistical methods for QTL mapping have been developed. Among these developments, the evolution of Bayesian approaches for multiple QTL mapping over the past decade has been remarkable. Bayesian methods can jointly infer the number of QTL, their genomic positions and their genetic effects. Here, we review recently developed and still developing Bayesian methods and associated computer software for mapping multiple QTL in experimental crosses. We compare and contrast these methods to clearly describe the relationships among different Bayesian methods. We conclude this review by highlighting some areas of future research.  相似文献   

6.
Statistical analyses are used in many fields of genetic research. Most geneticists are taught classical statistics, which includes hypothesis testing, estimation and the construction of confidence intervals; this framework has proved more than satisfactory in many ways. What does a Bayesian framework have to offer geneticists? Its utility lies in offering a more direct approach to some questions and the incorporation of prior information. It can also provide a more straightforward interpretation of results. The utility of a Bayesian perspective, especially for complex problems, is becoming increasingly clear to the statistics community; geneticists are also finding this framework useful and are increasingly utilizing the power of this approach.  相似文献   

7.
Bayesian statistics for parasitologists   总被引:3,自引:0,他引:3  
Bayesian statistical methods are increasingly being used in the analysis of parasitological data. Here, the basis of differences between the Bayesian method and the classical or frequentist approach to statistical inference is explained. This is illustrated with practical implications of Bayesian analyses using prevalence estimation of strongyloidiasis and onchocerciasis as two relevant examples. The strongyloidiasis example addresses the problem of parasitological diagnosis in the absence of a gold standard, whereas the onchocerciasis case focuses on the identification of villages warranting priority mass ivermectin treatment. The advantages and challenges faced by users of the Bayesian approach are also discussed and the readers pointed to further directions for a more in-depth exploration of the issues raised. We advocate collaboration between parasitologists and Bayesian statisticians as a fruitful and rewarding venture for advancing applied research in parasite epidemiology and the control of parasitic infections.  相似文献   

8.
The restricted mean survival time (RMST) evaluates the expectation of survival time truncated by a prespecified time point, because the mean survival time in the presence of censoring is typically not estimable. The frequentist inference procedure for RMST has been widely advocated for comparison of two survival curves, while research from the Bayesian perspective is rather limited. For the RMST of both right- and interval-censored data, we propose Bayesian nonparametric estimation and inference procedures. By assigning a mixture of Dirichlet processes (MDP) prior to the distribution function, we can estimate the posterior distribution of RMST. We also explore another Bayesian nonparametric approach using the Dirichlet process mixture model and make comparisons with the frequentist nonparametric method. Simulation studies demonstrate that the Bayesian nonparametric RMST under diffuse MDP priors leads to robust estimation and under informative priors it can incorporate prior knowledge into the nonparametric estimator. Analysis of real trial examples demonstrates the flexibility and interpretability of the Bayesian nonparametric RMST for both right- and interval-censored data.  相似文献   

9.

Background

The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged.

Methodology/Principal Findings

The purpose of this study is to develop an efficient model-based approach to perform Bayesian exploratory analyses for adaptive differentiation in very large SNP data sets. The basic idea is to start with a very simple model for neutral loci that is easy to implement under a Bayesian framework and to identify selected loci as outliers via Posterior Predictive P-values (PPP-values). Applications of this strategy are considered using two different statistical models. The first one was initially interpreted in the context of populations evolving respectively under pure genetic drift from a common ancestral population while the second one relies on populations under migration-drift equilibrium. Robustness and power of the two resulting Bayesian model-based approaches to detect SNP under selection are further evaluated through extensive simulations. An application to a cattle data set is also provided.

Conclusions/Significance

The procedure described turns out to be much faster than former Bayesian approaches and also reasonably efficient especially to detect loci under positive selection.  相似文献   

10.
Chen  Jiahua; Chen  Zehua 《Biometrika》2008,95(3):759-771
The ordinary Bayesian information criterion is too liberal formodel selection when the model space is large. In this paper,we re-examine the Bayesian paradigm for model selection andpropose an extended family of Bayesian information criteria,which take into account both the number of unknown parametersand the complexity of the model space. Their consistency isestablished, in particular allowing the number of covariatesto increase to infinity with the sample size. Their performancein various situations is evaluated by simulation studies. Itis demonstrated that the extended Bayesian information criteriaincur a small loss in the positive selection rate but tightlycontrol the false discovery rate, a desirable property in manyapplications. The extended Bayesian information criteria areextremely useful for variable selection in problems with a moderatesample size but with a huge number of covariates, especiallyin genome-wide association studies, which are now an activearea in genetics research.  相似文献   

11.
Identification of causal rare variants that are associated with complex traits poses a central challenge on genome-wide association studies. However, most current research focuses only on testing the global association whether the rare variants in a given genomic region are collectively associated with the trait. Although some recent work, e.g., the Bayesian risk index method, have tried to address this problem, it is unclear whether the causal rare variants can be consistently identified by them in the small--large- situation. We develop a new Bayesian method, the so-called Bayesian Rare Variant Detector (BRVD), to tackle this problem. The new method simultaneously addresses two issues: (i) (Global association test) Are there any of the variants associated with the disease, and (ii) (Causal variant detection) Which variants, if any, are driving the association. The BRVD ensures the causal rare variants to be consistently identified in the small--large- situation by imposing some appropriate prior distributions on the model and model specific parameters. The numerical results indicate that the BRVD is more powerful for testing the global association than the existing methods, such as the combined multivariate and collapsing test, weighted sum statistic test, RARECOVER, sequence kernel association test, and Bayesian risk index, and also more powerful for identification of causal rare variants than the Bayesian risk index method. The BRVD has also been successfully applied to the Early-Onset Myocardial Infarction (EOMI) Exome Sequence Data. It identified a few causal rare variants that have been verified in the literature.  相似文献   

12.
Knowledge of temporal change in ecological condition is important for the understanding and management of ecosystems. However, analyses of trends in biological condition have been rare, as there are usually too few data points at any single site to use many trend analysis techniques. We used a Bayesian hierarchical model to analyse temporal trends in stream ecological condition (as measured by the invertebrate-based index SIGNAL) across Melbourne, Australia. The Bayesian hierarchical approach assumes dependency amongst the sampling sites. Results for each site "borrow strength" from the other data because model parameter values are assumed to be drawn from a larger common distribution. This leads to robust inference despite the few data that exist at each site. Utilising the flexibility of the Bayesian approach, we also modelled change over time as a function of catchment urbanisation, allowed for potential temporal and spatial autocorrelation of the data and trend estimates, and used prior information to improve the estimate of data uncertainty. We found strong evidence of a widespread decline in SIGNAL scores for edge habitats (areas of little or no flow). The rate of decline was positively associated with catchment urbanisation. There was no evidence of such declines for riffle habitats (areas with rapid and turbulent flow). Melbourne has experienced a decline in rainfall, indicative of either drought and/or longer-term climate change. The results are consistent with the expected coupled effects of these rainfall changes and increasing urbanisation, but more research is needed to isolate a causal mechanism. More immediately, however, the Bayesian hierarchical approach has allowed us to identify a pattern in a biological monitoring data set that might otherwise have gone un-noticed, and to demonstrate a large-scale temporal decline in biological condition.  相似文献   

13.
The Poisson assumption is popular when data arises in the form of counts. In many applications such counts are fallible. Little research has been done on the Poisson distribution when both false positives and false negatives are present. We present a model in this paper that corrects for misclassification of count data. Bayesian estimators are developed. We provide the actual posterior distributions via integration. Markov Chain Monte Carlo results, which are more convenient for large sample sizes, are utilized for inference.  相似文献   

14.
MOTIVATION: Biological assays are often carried out on tissues that contain many cell lineages and active pathways. Microarray data produced using such material therefore reflect superimpositions of biological processes. Analysing such data for shared gene function by means of well-matched assays may help to provide a better focus on specific cell types and processes. The identification of genes that behave similarly in different biological systems also has the potential to reveal new insights into preserved biological mechanisms. RESULTS: In this article, we propose a hierarchical Bayesian model allowing integrated analysis of several microarray data sets for shared gene function. Each gene is associated with an indicator variable that selects whether binary class labels are predicted from expression values or by a classifier which is common to all genes. Each indicator selects the component models for all involved data sets simultaneously. A quantitative measure of shared gene function is obtained by inferring a probability measure over these indicators. Through experiments on synthetic data, we illustrate potential advantages of this Bayesian approach over a standard method. A shared analysis of matched microarray experiments covering (a) a cycle of mouse mammary gland development and (b) the process of in vitro endothelial cell apoptosis is proposed as a biological gold standard. Several useful sanity checks are introduced during data analysis, and we confirm the prior biological belief that shared apoptosis events occur in both systems. We conclude that a Bayesian analysis for shared gene function has the potential to reveal new biological insights, unobtainable by other means. AVAILABILITY: An online supplement and MatLab code are available at http://www.sykacek.net/research.html#mcabf  相似文献   

15.
MOTIVATION: Many biomedical and clinical research problems involve discovering causal relationships between observations gathered from temporal events. Dynamic Bayesian networks are a powerful modeling approach to describe causal or apparently causal relationships, and support complex medical inference, such as future response prediction, automated learning, and rational decision making. Although many engines exist for creating Bayesian networks, most require a local installation and significant data manipulation to be practical for a general biologist or clinician. No software pipeline currently exists for interpretation and inference of dynamic Bayesian networks learned from biomedical and clinical data. RESULTS: miniTUBA is a web-based modeling system that allows clinical and biomedical researchers to perform complex medical/clinical inference and prediction using dynamic Bayesian network analysis with temporal datasets. The software allows users to choose different analysis parameters (e.g. Markov lags and prior topology), and continuously update their data and refine their results. miniTUBA can make temporal predictions to suggest interventions based on an automated learning process pipeline using all data provided. Preliminary tests using synthetic data and laboratory research data indicate that miniTUBA accurately identifies regulatory network structures from temporal data. AVAILABILITY: miniTUBA is available at http://www.minituba.org.  相似文献   

16.
Determining the deleterious non-synonymous single nucleotide polymorphisms (nsSNPs), that might be involved in inducing disease-associated phenomena, is now among the most important field of computational genomic research. The rapid evolution in sequencing technologies has now outranged the limit of available sequence databases and has out-fledged the amount of SNP data that are yet to be characterized. In this article we have performed a comprehensive analysis of deleterious nsSNPs in MyH7 gene associated with cardiomyopathy cases using a set of computational platforms. We implemented a set of computational SNP analysis platforms along with the Bayesian calculations in order to filter the most likely mutation that might be associated with cardiomyopathy associated disorders. The Bayesian calculation depicted 27 fold rises in the likelihood score for causing cardiomyopathy disorder when MyH7 gene mutations were compiled. Furthermore, we reported E466Q mutation in MyH7 motor domain that showed increase in the amyloid propensity of protein, as well as a significant level of pathogenicity was also observed. The prediction roadmap followed in this article has showed a notable range of accuracy and can be used for determining cardiomyopathy associated nsSNPs for other candidate genes.  相似文献   

17.
Stable isotope analysis of diet has become a common tool in conservation research. However, the multiple sources of uncertainty inherent in this analysis framework involve consequences that have not been thoroughly addressed. Uncertainty arises from the choice of trophic discrimination factors, and for Bayesian stable isotope mixing models (SIMMs), the specification of prior information; the combined effect of these aspects has not been explicitly tested. We used a captive feeding study of gray wolves (Canis lupus) to determine the first experimentally-derived trophic discrimination factors of C and N for this large carnivore of broad conservation interest. Using the estimated diet in our controlled system and data from a published study on wild wolves and their prey in Montana, USA, we then investigated the simultaneous effect of discrimination factors and prior information on diet reconstruction with Bayesian SIMMs. Discrimination factors for gray wolves and their prey were 1.97‰ for δ13C and 3.04‰ for δ15N. Specifying wolf discrimination factors, as opposed to the commonly used red fox (Vulpes vulpes) factors, made little practical difference to estimates of wolf diet, but prior information had a strong effect on bias, precision, and accuracy of posterior estimates. Without specifying prior information in our Bayesian SIMM, it was not possible to produce SIMM posteriors statistically similar to the estimated diet in our controlled study or the diet of wild wolves. Our study demonstrates the critical effect of prior information on estimates of animal diets using Bayesian SIMMs, and suggests species-specific trophic discrimination factors are of secondary importance. When using stable isotope analysis to inform conservation decisions researchers should understand the limits of their data. It may be difficult to obtain useful information from SIMMs if informative priors are omitted and species-specific discrimination factors are unavailable.  相似文献   

18.
In recent years, there has been much interest in characterizing statistical properties of natural stimuli in order to better understand the design of perceptual systems. A fruitful approach has been to compare the processing of natural stimuli in real perceptual systems with that of ideal observers derived within the framework of Bayesian statistical decision theory. While this form of optimization theory has provided a deeper understanding of the information contained in natural stimuli as well as of the computational principles employed in perceptual systems, it does not directly consider the process of natural selection, which is ultimately responsible for design. Here we propose a formal framework for analysing how the statistics of natural stimuli and the process of natural selection interact to determine the design of perceptual systems. The framework consists of two complementary components. The first is a maximum fitness ideal observer, a standard Bayesian ideal observer with a utility function appropriate for natural selection. The second component is a formal version of natural selection based upon Bayesian statistical decision theory. Maximum fitness ideal observers and Bayesian natural selection are demonstrated in several examples. We suggest that the Bayesian approach is appropriate not only for the study of perceptual systems but also for the study of many other systems in biology.  相似文献   

19.

This article describes the application of a simplified Bayesian method for estimation of doses from a mixed field using cytogenetic biological dosimetry, taking as an example neutron and gamma radiation emitted from the MARIA nuclear research reactor in Poland. The Bayesian approach is a good alternative to the commonly used iterative method, which allows separate dose estimation. In the present paper, a computer program, which uses the iterative and simplified Bayesian methods to calculate mixed radiation doses, is introduced.

  相似文献   

20.
随着质谱技术的快速发展,蛋白质组学已成为继基因组学、转录组学之后的又一研究热点,寻找可靠的差异表达蛋白对于生物标记物的发现至关重要.因此,如何准确、灵敏地筛选出差异蛋白已成为基于质谱的定量蛋白质组学的主要研究内容之一.目前,针对该问题的研究方法众多,但这些方法策略的适用范围不尽相同.总体来说,基于质谱技术筛选差异蛋白的统计学策略可以分为3类:基于经典统计学派的策略、基于贝叶斯学派的统计检验策略和其他策略,这3类方法有各自的应用范围、特点及不足.此外,筛选过程还将产生部分假阳性结果,可以采用其他方法对差异表达蛋白的质量进行控制,以提高统计检验结果的可靠性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号