共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Analysing panel count data with informative observation times 总被引:1,自引:0,他引:1
4.
Randomized trials with dropouts or censored data and discrete time-to-event type outcomes are frequently analyzed using the Kaplan-Meier or product limit (PL) estimation method. However, the PL method assumes that the censoring mechanism is noninformative and when this assumption is violated, the inferences may not be valid. We propose an expanded PL method using a Bayesian framework to incorporate informative censoring mechanism and perform sensitivity analysis on estimates of the cumulative incidence curves. The expanded method uses a model, which can be viewed as a pattern mixture model, where odds for having an event during the follow-up interval $$({t}_{k-1},{t}_{k}]$$, conditional on being at risk at $${t}_{k-1}$$, differ across the patterns of missing data. The sensitivity parameters relate the odds of an event, between subjects from a missing-data pattern with the observed subjects for each interval. The large number of the sensitivity parameters is reduced by considering them as random and assumed to follow a log-normal distribution with prespecified mean and variance. Then we vary the mean and variance to explore sensitivity of inferences. The missing at random (MAR) mechanism is a special case of the expanded model, thus allowing exploration of the sensitivity to inferences as departures from the inferences under the MAR assumption. The proposed approach is applied to data from the TRial Of Preventing HYpertension. 相似文献
5.
As the extent of human genetic variation becomes more fully characterized, the research community is faced with the challenging task of using this information to dissect the heritable components of complex traits. Genomewide association studies offer great promise in this respect, but their analysis poses formidable difficulties. In this article, we describe a computationally efficient approach to mining genotype-phenotype associations that scales to the size of the data sets currently being collected in such studies. We use discrete graphical models as a data-mining tool, searching for single- or multilocus patterns of association around a causative site. The approach is fully Bayesian, allowing us to incorporate prior knowledge on the spatial dependencies around each marker due to linkage disequilibrium, which reduces considerably the number of possible graphical structures. A Markov chain-Monte Carlo scheme is developed that yields samples from the posterior distribution of graphs conditional on the data from which probabilistic statements about the strength of any genotype-phenotype association can be made. Using data simulated under scenarios that vary in marker density, genotype relative risk of a causative allele, and mode of inheritance, we show that the proposed approach has better localization properties and leads to lower false-positive rates than do single-locus analyses. Finally, we present an application of our method to a quasi-synthetic data set in which data from the CYP2D6 region are embedded within simulated data on 100K single-nucleotide polymorphisms. Analysis is quick (<5 min), and we are able to localize the causative site to a very short interval. 相似文献
6.
Background
Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a method to integrate multiple independent studies efficiently. 相似文献7.
8.
9.
Poon AF Lewis FI Frost SD Kosakovsky Pond SL 《Bioinformatics (Oxford, England)》2008,24(17):1949-1950
Spidermonkey is a new component of the Datamonkey suite of phylogenetic tools that provides methods for detecting coevolving sites from a multiple alignment of homologous nucleotide or amino acid sequences. It reconstructs the substitution history of the alignment by maximum likelihood-based phylogenetic methods, and then analyzes the joint distribution of substitution events using Bayesian graphical models to identify significant associations among sites. AVAILABILITY: Spidermonkey is publicly available both as a web application at http://www.data-monkey.org and as a stand-alone component of the phylogenetic software package HyPhy, which is freely distributed on the web (http://www.hyphy.org) as precompiled binaries and open source. 相似文献
10.
Functional magnetic resonance imaging (fMRI), with blood oxygenation level-dependent (BOLD) contrast, is a widely used technique for studying the human brain. However, it is an indirect measure of underlying neuronal activity and the processes that link this activity to BOLD signals are still a topic of much debate. In order to relate findings from fMRI research to other measures of neuronal activity it is vital to understand the underlying neurovascular coupling mechanism. Currently, there is no consensus on the relative roles of synaptic and spiking activity in the generation of the BOLD response. Here we designed a modelling framework to investigate different neurovascular coupling mechanisms. We use Electroencephalographic (EEG) and fMRI data from a visual stimulation task together with biophysically informed mathematical models describing how neuronal activity generates the BOLD signals. These models allow us to non-invasively infer the degree of local synaptic and spiking activity in the healthy human brain. In addition, we use Bayesian model comparison to decide between neurovascular coupling mechanisms. We show that the BOLD signal is dependent upon both the synaptic and spiking activity but that the relative contributions of these two inputs are dependent upon the underlying neuronal firing rate. When the underlying neuronal firing is low then the BOLD response is best explained by synaptic activity. However, when the neuronal firing rate is high then both synaptic and spiking activity are required to explain the BOLD signal. 相似文献
11.
Background
Long-range communication is very common in proteins but the physical basis of this phenomenon remains unclear. In order to gain insight into this problem, we decided to explore whether long-range interactions exist in lattice models of proteins. Lattice models of proteins have proven to capture some of the basic properties of real proteins and, thus, can be used for elucidating general principles of protein stability and folding. 相似文献12.
13.
In the last decade Dynamic Bayesian Networks (DBNs) have become one type of the most attractive probabilistic modelling framework extensions of Bayesian Networks (BNs) for working under uncertainties from a temporal perspective. Despite this popularity not many researchers have attempted to study the use of these networks in anomaly detection or the implications of data anomalies on the outcome of such models. An abnormal change in the modelled environment’s data at a given time, will cause a trailing chain effect on data of all related environment variables in current and consecutive time slices. Albeit this effect fades with time, it still can have an ill effect on the outcome of such models. In this paper we propose an algorithm for pilot error detection, using DBNs as the modelling framework for learning and detecting anomalous data. We base our experiments on the actions of an aircraft pilot, and a flight simulator is created for running the experiments. The proposed anomaly detection algorithm has achieved good results in detecting pilot errors and effects on the whole system. 相似文献
14.
T. M. GLASBY 《Austral ecology》1997,22(4):448-459
Abstract Several recent methods have been developed for detecting anthropogenic perturbations. Most have been analyses of data collected before and after some anthropogenic disturbance. It may, however, be more common that data can be collected only after a disturbance has occurred. In such situations, the only appropriate sampling design to use will often be an asymmetrical design because it will avoid problems of spatial confounding. Here, I describe in detail the steps involved in constructing asymmetrical analyses of variance using a case study of subtidal epibiota around marinas as an example. Differences between the marina and control locations were detected for a number of taxa, but this was often only possible after post-hoc pooling of non-significant terms. Marina and control locations varied greatly from estuary to estuary and consequently it was not possible to identify suites of species that were typical of either type of location. This result highlighted the need for multiple control locations near each marina to allow a reliable estimate of the variability among controls. Large variability among controls would mean that if differences existed between disturbed and control locations they would rarely be detected. These and other problems associated with analysing ‘after data’ are discussed in addition to the precautions to take when designing environmental sampling regimes. 相似文献
15.
Multi-species compartment epidemic models, such as the multi-species susceptible–infectious–recovered (SIR) model, are extensions of the classic SIR models, which are used to explore the transient dynamics of pathogens that infect multiple hosts in a large population. In this article, we propose a dynamical Bayesian hierarchical SIR (HSIR) model, to capture the stochastic or random nature of an epidemic process in a multi-species SIR (with recovered becoming susceptible again) dynamical setting, under hidden mass balance constraints. We call this a Bayesian hierarchical multi-species SIR (MSIRB) model. Different from a classic multi-species SIR model (which we call MSIRC), our approach imposes mass balance on the underlying true counts rather than, improperly, on the noisy observations. Moreover, the MSIRB model can capture the discrete nature of, as well as uncertainties in, the epidemic process. 相似文献
16.
17.
18.
Liang KC Wang X Anastassiou D 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(3):430-440
It has been shown that electropherograms of DNA sequences can be modeled with hidden Markov models. Basecalling, the procedure that determines the sequence of bases from the given eletropherogram, can then be performed using the Viterbi algorithm. A training step is required prior to basecalling in order to estimate the HMM parameters. In this paper, we propose a Bayesian approach which employs the Markov chain Monte Carlo (MCMC) method to perform basecalling. Such an approach not only allows one to naturally encode the prior biological knowledge into the basecalling algorithm, it also exploits both the training data and the basecalling data in estimating the HMM parameters, leading to more accurate estimates. Using the recently sequenced genome of the organism Legionella pneumophila we show that the MCMC basecaller outperforms the state-of-the-art basecalling algorithm in terms of total errors while requiring much less training than other proposed statistical basecallers. 相似文献
19.
Hidden Markov models have been used to restore recorded signals of single ion channels buried in background noise. Parameter estimation and signal restoration are usually carried out through likelihood maximization by using variants of the Baum-Welch forward-backward procedures. This paper presents an alternative approach for dealing with this inferential task. The inferences are made by using a combination of the framework provided by Bayesian statistics and numerical methods based on Markov chain Monte Carlo stochastic simulation. The reliability of this approach is tested by using synthetic signals of known characteristics. The expectations of the model parameters estimated here are close to those calculated using the Baum-Welch algorithm, but the present methods also yield estimates of their errors. Comparisons of the results of the Bayesian Markov Chain Monte Carlo approach with those obtained by filtering and thresholding demonstrate clearly the superiority of the new methods. 相似文献