共查询到20条相似文献,搜索用时 15 毫秒
1.
Marco J. Morelli Ga?l Thébaud Jo?l Chad?uf Donald P. King Daniel T. Haydon Samuel Soubeyrand 《PLoS computational biology》2012,8(11)
The accurate identification of the route of transmission taken by an infectious agent through a host population is critical to understanding its epidemiology and informing measures for its control. However, reconstruction of transmission routes during an epidemic is often an underdetermined problem: data about the location and timings of infections can be incomplete, inaccurate, and compatible with a large number of different transmission scenarios. For fast-evolving pathogens like RNA viruses, inference can be strengthened by using genetic data, nowadays easily and affordably generated. However, significant statistical challenges remain to be overcome in the full integration of these different data types if transmission trees are to be reliably estimated. We present here a framework leading to a bayesian inference scheme that combines genetic and epidemiological data, able to reconstruct most likely transmission patterns and infection dates. After testing our approach with simulated data, we apply the method to two UK epidemics of Foot-and-Mouth Disease Virus (FMDV): the 2007 outbreak, and a subset of the large 2001 epidemic. In the first case, we are able to confirm the role of a specific premise as the link between the two phases of the epidemics, while transmissions more densely clustered in space and time remain harder to resolve. When we consider data collected from the 2001 epidemic during a time of national emergency, our inference scheme robustly infers transmission chains, and uncovers the presence of undetected premises, thus providing a useful tool for epidemiological studies in real time. The generation of genetic data is becoming routine in epidemiological investigations, but the development of analytical tools maximizing the value of these data remains a priority. Our method, while applied here in the context of FMDV, is general and with slight modification can be used in any situation where both spatiotemporal and genetic data are available. 相似文献
2.
Qing Xie Qi Liu Fengbiao Mao Wanshi Cai Honghu Wu Mingcong You Zhen Wang Bingyu Chen Zhong Sheng Sun Jinyu Wu 《PLoS computational biology》2014,10(9)
High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1%) than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines. 相似文献
3.
4.
5.
Background
Health inequities in developing countries are difficult to eradicate because of limited resources. The neglect of adult mortality in Sub-Saharan Africa (SSA) is a particular concern. Advances in data availability, software and analytic methods have created opportunities to address this challenge and tailor interventions to small areas. This study demonstrates how a generic framework can be applied to guide policy interventions to reduce adult mortality in high risk areas. The framework, therefore, incorporates the spatial clustering of adult mortality, estimates the impact of a range of determinants and quantifies the impact of their removal to ensure optimal returns on scarce resources.Methods
Data from a national cross-sectional survey in 2007 were used to illustrate the use of the generic framework for SSA and elsewhere. Adult mortality proportions were analyzed at four administrative levels and spatial analyses were used to identify areas with significant excess mortality. An ecological approach was then used to assess the relationship between mortality “hotspots” and various determinants. Population attributable fractions were calculated to quantify the reduction in mortality as a result of targeted removal of high-impact determinants.Results
Overall adult mortality rate was 145 per 10,000. Spatial disaggregation identified a highly non-random pattern and 67 significant high risk local municipalities were identified. The most prominent determinants of adult mortality included HIV antenatal sero-prevalence, low SES and lack of formal marital union status. The removal of the most attributable factors, based on local area prevalence, suggest that overall adult mortality could be potentially reduced by ∼90 deaths per 10,000.Conclusions
The innovative use of secondary data and advanced epidemiological techniques can be combined in a generic framework to identify and map mortality to the lowest administration level. The identification of high risk mortality determinants allows health authorities to tailor interventions at local level. This approach can be replicated elsewhere. 相似文献6.
Shinya Tasaki Ben Sauerwine Bruce Hoff Hiroyoshi Toyoshiba Chris Gaiteri Elias Chaibub Neto 《Genetics》2015,199(4):973-989
Reconstructing biological networks using high-throughput technologies has the potential to produce condition-specific interactomes. But are these reconstructed networks a reliable source of biological interactions? Do some network inference methods offer dramatically improved performance on certain types of networks? To facilitate the use of network inference methods in systems biology, we report a large-scale simulation study comparing the ability of Markov chain Monte Carlo (MCMC) samplers to reverse engineer Bayesian networks. The MCMC samplers we investigated included foundational and state-of-the-art Metropolis–Hastings and Gibbs sampling approaches, as well as novel samplers we have designed. To enable a comprehensive comparison, we simulated gene expression and genetics data from known network structures under a range of biologically plausible scenarios. We examine the overall quality of network inference via different methods, as well as how their performance is affected by network characteristics. Our simulations reveal that network size, edge density, and strength of gene-to-gene signaling are major parameters that differentiate the performance of various samplers. Specifically, more recent samplers including our novel methods outperform traditional samplers for highly interconnected large networks with strong gene-to-gene signaling. Our newly developed samplers show comparable or superior performance to the top existing methods. Moreover, this performance gain is strongest in networks with biologically oriented topology, which indicates that our novel samplers are suitable for inferring biological networks. The performance of MCMC samplers in this simulation framework can guide the choice of methods for network reconstruction using systems genetics data. 相似文献
7.
We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. We report a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. Our derivation yields a computationally feasible formula that can prove useful in a variety of statistical problems involving absolute quantification under uncertainty. We implemented our algorithm in the R package dupiR and compared it with a previously proposed Bayesian method based on a Gamma prior. As a showcase, we demonstrate that our inference framework can be used to estimate bacterial survival curves from measurements characterized by extremely low or zero counts and rather high sampling fractions. All in all, we provide a versatile, general purpose algorithm to infer population sizes from count data, which can find application in a broad spectrum of biological and physical problems. 相似文献
8.
9.
Heterogeneous networked clusters are being increasingly used as platforms for resource-intensive parallel and distributed applications. The fundamental underlying idea is to provide large amounts of processing capacity over extended periods of time by harnessing the idle and available resources on the network in an opportunistic manner. In this paper we present the design, implementation and evaluation of a framework that uses JavaSpaces to support this type of opportunistic adaptive parallel/distributed computing over networked clusters in a non-intrusive manner. The framework targets applications exhibiting coarse grained parallelism and has three key features: (1) portability across heterogeneous platforms, (2) minimal configuration overheads for participating nodes, and (3) automated system state monitoring (using SNMP) to ensure non-intrusive behavior. Experimental results presented in this paper demonstrate that for applications that can be broken into coarse-grained, relatively independent tasks, the opportunistic adaptive parallel computing framework can provide performance gains. Furthermore, the results indicate that monitoring and reacting to the current system state minimizes the intrusiveness of the framework. 相似文献
10.
In mammalian cells, transcribed enhancers (TrEns) play important roles in the initiation of gene expression and maintenance of gene expression levels in a spatiotemporal manner. One of the most challenging questions is how the genomic characteristics of enhancers relate to enhancer activities. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers’ DNA code in a more systematic way. To address this problem, we developed a novel computational framework, Transcribed Enhancer Landscape Search (TELS), aimed at identifying predictive cell type/tissue-specific motif signatures of TrEns. As a case study, we used TELS to compile a comprehensive catalog of motif signatures for all known TrEns identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that combinations of different short motifs characterize in an optimized manner cell type/tissue-specific TrEns. Our study is the first to report combinations of motifs that maximize classification performance of TrEns exclusively transcribed in one cell type/tissue from TrEns exclusively transcribed in different cell types/tissues. Moreover, we also report 31 motif signatures predictive of enhancers’ broad activity. TELS codes and material are publicly available at http://www.cbrc.kaust.edu.sa/TELS. 相似文献
11.
The choice of summary statistics is a crucial step in approximate Bayesian computation (ABC). Since statistics are often not sufficient, this choice involves a trade-off between loss of information and reduction of dimensionality. The latter may increase the efficiency of ABC. Here, we propose an approach for choosing summary statistics based on boosting, a technique from the machine-learning literature. We consider different types of boosting and compare them to partial least-squares regression as an alternative. To mitigate the lack of sufficiency, we also propose an approach for choosing summary statistics locally, in the putative neighborhood of the true parameter value. We study a demographic model motivated by the reintroduction of Alpine ibex (Capra ibex) into the Swiss Alps. The parameters of interest are the mean and standard deviation across microsatellites of the scaled ancestral mutation rate (θanc = 4Neu) and the proportion of males obtaining access to matings per breeding season (ω). By simulation, we assess the properties of the posterior distribution obtained with the various methods. According to our criteria, ABC with summary statistics chosen locally via boosting with the L2-loss performs best. Applying that method to the ibex data, we estimate and find that most of the variation across loci of the ancestral mutation rate u is between 7.7 × 10−4 and 3.5 × 10−3 per locus per generation. The proportion of males with access to matings is estimated as , which is in good agreement with recent independent estimates. 相似文献
12.
Objectives
Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.Methods
In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.Results
Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan''s nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).Conclusions
Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. 相似文献13.
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns''s disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn''s disease; and one novel association implicating a gene involved in apoptosis pathways in rheumatoid arthritis. We provide software for applying our PUMA analysis framework. 相似文献
14.
Fischer KJ Jacobs CR Levenston ME Cody DD Carter DR 《Computer methods in biomechanics and biomedical engineering》1998,1(3):233-245
A density-based load estimation method was applied to determine femoral load patterns. Two-dimensional finite element models were constructed using single energy quantitative computed tomography (QCT) data from two femora. Basic load cases included parabolic pressure joint loads and constant tractions on the greater trochanter. An optimization procedure adjusted magnitudes of the basic load cases, such that the applied mechanical stimulus approached the ideal stimulus throughout each model. Dominant estimated load directions were generally consistent with published experimental data for gait. Other estimated loads suggested that loads at extreme joint orientations may be important to maintenance of bone structure. Remodeling simulations with the estimated loads produced density distributions qualitatively similar to the QCT data sets. Average nodal density errors between QCT data and predictions were 0.24 g/cm(3) and 0.28 g/cm(3). The results indicate that density-based load estimation could improve understanding of loading patterns on bones. 相似文献
15.
Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits. 相似文献
16.
The neural patterns recorded during a neuroscientific experiment reflect complex interactions between many brain regions, each comprising millions of neurons. However, the measurements themselves are typically abstracted from that underlying structure. For example, functional magnetic resonance imaging (fMRI) datasets comprise a time series of three-dimensional images, where each voxel in an image (roughly) reflects the activity of the brain structure(s)–located at the corresponding point in space–at the time the image was collected. FMRI data often exhibit strong spatial correlations, whereby nearby voxels behave similarly over time as the underlying brain structure modulates its activity. Here we develop topographic factor analysis (TFA), a technique that exploits spatial correlations in fMRI data to recover the underlying structure that the images reflect. Specifically, TFA casts each brain image as a weighted sum of spatial functions. The parameters of those spatial functions, which may be learned by applying TFA to an fMRI dataset, reveal the locations and sizes of the brain structures activated while the data were collected, as well as the interactions between those structures. 相似文献
17.
Thomas H. Scheike 《Biometrical journal. Biometrische Zeitschrift》1997,39(1):57-67
This paper reviews a general framework for the modelling of longitudinal data with random measurement times based on marked point processes and presents a worked example. We construct a quite general regression models for longitudinal data, which may in particular include censoring that only depend on the past and outside random variation, and dependencies between measurement times and measurements. The modelling also generalises statistical counting process models. We review a non-parametric Nadarya-Watson kernel estimator of the regression function, and a parametric analysis that is based on a conditional least squares (CLS) criterion. The parametric analysis presented, is a conditional version of the generalised estimation equations of LIANG and ZEGER (1986). We conclude that the usual nonparametric and parametric regression modelling can be applied to this general set-up, with some modifications. The presented framework provides an easily implemented and powerful tool for model building for repeated measurements. 相似文献
18.
Estimation of the location and magnitude of the optimum has long been considered an important problem in response surface methodology. In the industrial context, prior information accumulated by the subject matter specialist bears special significance. In this paper we use the Bayesian approach to estimating the optimum in a single factor quadratic regression model. Following the Bayesian general linear model development by Broemeling the normal/gamma conjugate prior is used. Explicit formulas for the generalized maximum likehood estimates of the characteristic parameters are obtained from the joint posterior distribution. 相似文献
19.
Comparative effectiveness research aims, in part, to provide evidence most relevant to clinical decision making. One decision relevant to hypertensive patients is which therapeutic drug class is the most safe and effective. In addition, once a drug class has been chosen it would be useful to know whether there are differences in effectiveness between drugs within class. Randomized trials are unlikely to provide sufficient evidence for answering these questions. We therefore propose a modeling approach that can be used to address the questions using administrative databases. We propose a Bayesian hierarchical model, where drugs are nested within their corresponding class. We account for the type of missing data that are common in these databases using a pattern mixture model. The methodology is illustrated using data from a comparative effectiveness study of antihypertensive medications. 相似文献
20.
To study lifetimes of certain engineering processes, a lifetime model which can accommodate the nature of such processes is desired. The mixture models of underlying lifetime distributions are intuitively more appropriate and appealing to model the heterogeneous nature of process as compared to simple models. This paper is about studying a 3-component mixture of the Rayleigh distributionsin Bayesian perspective. The censored sampling environment is considered due to its popularity in reliability theory and survival analysis. The expressions for the Bayes estimators and their posterior risks are derived under different scenarios. In case the case that no or little prior information is available, elicitation of hyperparameters is given. To examine, numerically, the performance of the Bayes estimators using non-informative and informative priors under different loss functions, we have simulated their statistical properties for different sample sizes and test termination times. In addition, to highlight the practical significance, an illustrative example based on a real-life engineering data is also given. 相似文献