首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
MOTIVATION: Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level measurement error provides useful information which can help in the identification of differentially expressed genes. RESULTS: We propose a Bayesian method to include probe-level measurement error into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for efficient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational efficiency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in dataset and a mouse time-course dataset. Results show that the inclusion of probe-level measurement error improves accuracy in detecting differential gene expression. AVAILABILITY: The MAP approximation and variational inference described in this paper have been implemented in an R package pplr. The MCMC method is implemented in Matlab. Both software are available from http://umber.sbs.man.ac.uk/resources/puma.  相似文献   

2.
Liang LJ  Weiss RE 《Biometrics》2007,63(3):733-741
Phylogenetic modeling is computationally challenging and most phylogeny models fit a single phylogeny to a single set of molecular sequences. Individual phylogenetic analyses are typically performed independently using publicly available software that fits a computationally intensive Bayesian model using Markov chain Monte Carlo (MCMC) simulation. We develop a Bayesian hierarchical semiparametric regression model to combine multiple phylogenetic analyses of HIV-1 nucleotide sequences and estimate parameters of interest within and across analyses. We use a mixture of Dirichlet processes as a prior for the parameters to relax inappropriate parametric assumptions and to ensure the prior distribution for the parameters is continuous. We use several reweighting algorithms for combining completed MCMC analyses to shrink parameter estimates while adjusting for data set-specific covariates. This avoids constructing a large complex model involving all the original data, which would be computationally challenging and would require rewriting the existing stand-alone software.  相似文献   

3.
Summary .  This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets. Direct application of such models to large spatial datasets are, however, computationally infeasible because of cubic-order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negate the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects, we outline two approaches for circumventing the prohibitively expensive matrix decompositions: the first leverages analytical results from Ornstein–Uhlenbeck processes that yield computationally efficient tridiagonal structures, whereas the second derives a modified predictive process model from the original model by projecting its realizations to a lower-dimensional subspace, thereby reducing the computational burden. We illustrate the proposed methods using a synthetic dataset with additive, dominance, genetic effects and anisotropic spatial residuals, and a large dataset from a Scots pine ( Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial, which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.  相似文献   

4.
MOTIVATION: For Affymetrix microarray platforms, gene expression is determined by computing the difference in signal intensities between perfect match (PM) and mismatch (MM) probesets. Although the use of PM is not controversial, MM probesets have been associated with variance and ultimately inaccurate gene expression calls. A principal focus of this study was to investigate the nature of the MM signal intensities and demonstrate its contribution to the experimental results. RESULTS: While most MM intensities were likely associated with random noise, a subset of approximately 20% (99,485) of the MM probes displayed relatively high signal intensities to the corresponding PM probes (MM > PM) in a non-random fashion; 13,440 of these probes demonstrated exceptionally high 'outlier' intensities. About 15,938 PM probes also demonstrated exceptionally high outlier intensities consistently across all hybridizations. About 92% of the MM > PM probes had either a dThymidine (dT) or a dCytidine (dC) at the 13th position of the probe sequence. MM and PM probes displaying extremely high outlier intensities contained high dC rich nucleotides, and low dA contents at other nucleotides positions along the 25mer probe sequence. Differentially expressed genes generated using Genechip Operating System (GCOS) or modified PM-only methods were also examined. Of those candidate genes identified in the PM-only method, 157 of them were designated by GCOS as absent across all datasets and many others contained probes with MM > PM signal intensities. Our data suggests that MM intensity from PM signal can be a major source of error analysis, leading to fewer potentially biologically important candidate genes. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

5.
I describe an open‐source R package, multimark , for estimation of survival and abundance from capture–mark–recapture data consisting of multiple “noninvasive” marks. Noninvasive marks include natural pelt or skin patterns, scars, and genetic markers that enable individual identification in lieu of physical capture. multimark provides a means for combining and jointly analyzing encounter histories from multiple noninvasive sources that otherwise cannot be reliably matched (e.g., left‐ and right‐sided photographs of bilaterally asymmetrical individuals). The package is currently capable of fitting open population Cormack–Jolly–Seber (CJS) and closed population abundance models with up to two mark types using Bayesian Markov chain Monte Carlo (MCMC) methods. multimark can also be used for Bayesian analyses of conventional capture–recapture data consisting of a single‐mark type. Some package features include (1) general model specification using formulas already familiar to most R users, (2) ability to include temporal, behavioral, age, cohort, and individual heterogeneity effects in detection and survival probabilities, (3) improved MCMC algorithm that is computationally faster and more efficient than previously proposed methods, (4) Bayesian multimodel inference using reversible jump MCMC, and (5) data simulation capabilities for power analyses and assessing model performance. I demonstrate use of multimark using left‐ and right‐sided encounter histories for bobcats (Lynx rufus) collected from remote single‐camera stations in southern California. In this example, there is evidence of a behavioral effect (i.e., trap “happy” response) that is otherwise indiscernible using conventional single‐sided analyses. The package will be most useful to ecologists seeking stronger inferences by combining different sources of mark–recapture data that are difficult (or impossible) to reliably reconcile, particularly with the sparse datasets typical of rare or elusive species for which noninvasive sampling techniques are most commonly employed. Addressing deficiencies in currently available software, multimark also provides a user‐friendly interface for performing Bayesian multimodel inference using capture–recapture data consisting of a single conventional mark or multiple noninvasive marks.  相似文献   

6.
7.
Qianxing Mo  Faming Liang 《Biometrics》2010,66(4):1284-1294
Summary ChIP‐chip experiments are procedures that combine chromatin immunoprecipitation (ChIP) and DNA microarray (chip) technology to study a variety of biological problems, including protein–DNA interaction, histone modification, and DNA methylation. The most important feature of ChIP‐chip data is that the intensity measurements of probes are spatially correlated because the DNA fragments are hybridized to neighboring probes in the experiments. We propose a simple, but powerful Bayesian hierarchical approach to ChIP‐chip data through an Ising model with high‐order interactions. The proposed method naturally takes into account the intrinsic spatial structure of the data and can be used to analyze data from multiple platforms with different genomic resolutions. The model parameters are estimated using the Gibbs sampler. The proposed method is illustrated using two publicly available data sets from Affymetrix and Agilent platforms, and compared with three alternative Bayesian methods, namely, Bayesian hierarchical model, hierarchical gamma mixture model, and Tilemap hidden Markov model. The numerical results indicate that the proposed method performs as well as the other three methods for the data from Affymetrix tiling arrays, but significantly outperforms the other three methods for the data from Agilent promoter arrays. In addition, we find that the proposed method has better operating characteristics in terms of sensitivities and false discovery rates under various scenarios.  相似文献   

8.
We present Bayesian hierarchical models for the analysis of Affymetrix GeneChip data. The approach we take differs from other available approaches in two fundamental aspects. Firstly, we aim to integrate all processing steps of the raw data in a common statistically coherent framework, allowing all components and thus associated errors to be considered simultaneously. Secondly, inference is based on the full posterior distribution of gene expression indices and derived quantities, such as fold changes or ranks, rather than on single point estimates. Measures of uncertainty on these quantities are thus available. The models presented represent the first building block for integrated Bayesian Analysis of Affymetrix GeneChip data: the models take into account additive as well as multiplicative error, gene expression levels are estimated using perfect match and a fraction of mismatch probes and are modeled on the log scale. Background correction is incorporated by modeling true signal and cross-hybridization explicitly, and a need for further normalization is considerably reduced by allowing for array-specific distributions of nonspecific hybridization. When replicate arrays are available for a condition, posterior distributions of condition-specific gene expression indices are estimated directly, by a simultaneous consideration of replicate probe sets, avoiding averaging over estimates obtained from individual replicate arrays. The performance of the Bayesian model is compared to that of standard available point estimate methods on subsets of the well known GeneLogic and Affymetrix spike-in data. The Bayesian model is found to perform well and the integrated procedure presented appears to hold considerable promise for further development.  相似文献   

9.
Spatially Dependent Polya Tree Modeling for Survival Data   总被引:1,自引:0,他引:1  
Summary With the proliferation of spatially oriented time‐to‐event data, spatial modeling in the survival context has received increased recent attention. A traditional way to capture a spatial pattern is to introduce frailty terms in the linear predictor of a semiparametric model, such as proportional hazards or accelerated failure time. We propose a new methodology to capture the spatial pattern by assuming a prior based on a mixture of spatially dependent Polya trees for the baseline survival in the proportional hazards model. Thanks to modern Markov chain Monte Carlo (MCMC) methods, this approach remains computationally feasible in a fully hierarchical Bayesian framework. We compare the spatially dependent mixture of Polya trees (MPT) approach to the traditional spatial frailty approach, and illustrate the usefulness of this method with an analysis of Iowan breast cancer survival data from the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute. Our method provides better goodness of fit over the traditional alternatives as measured by log pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and full sample score (FSS) statistics.  相似文献   

10.
The linear receptive field describes a mapping from sensory stimuli to a one-dimensional variable governing a neuron's spike response. However, traditional receptive field estimators such as the spike-triggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm for fully Bayesian inference under the ALD prior, yielding accurate Bayesian confidence intervals for small or noisy datasets.  相似文献   

11.
Bayesian statistical methods based on simulation techniques have recently been shown to provide powerful tools for the analysis of genetic population structure. We have previously developed a Markov chain Monte Carlo (MCMC) algorithm for characterizing genetically divergent groups based on molecular markers and geographical sampling design of the dataset. However, for large-scale datasets such algorithms may get stuck to local maxima in the parameter space. Therefore, we have modified our earlier algorithm to support multiple parallel MCMC chains, with enhanced features that enable considerably faster and more reliable estimation compared to the earlier version of the algorithm. We consider also a hierarchical tree representation, from which a Bayesian model-averaged structure estimate can be extracted. The algorithm is implemented in a computer program that features a user-friendly interface and built-in graphics. The enhanced features are illustrated by analyses of simulated data and an extensive human molecular dataset. AVAILABILITY: Freely available at http://www.rni.helsinki.fi/~jic/bapspage.html.  相似文献   

12.
Improved efficiency of Markov chain Monte Carlo facilitates all aspects of statistical analysis with Bayesian hierarchical models. Identifying strategies to improve MCMC performance is becoming increasingly crucial as the complexity of models, and the run times to fit them, increases. We evaluate different strategies for improving MCMC efficiency using the open‐source software NIMBLE (R package nimble) using common ecological models of species occurrence and abundance as examples. We ask how MCMC efficiency depends on model formulation, model size, data, and sampling strategy. For multiseason and/or multispecies occupancy models and for N‐mixture models, we compare the efficiency of sampling discrete latent states vs. integrating over them, including more vs. fewer hierarchical model components, and univariate vs. block‐sampling methods. We include the common MCMC tool JAGS in comparisons. For simple models, there is little practical difference between computational approaches. As model complexity increases, there are strong interactions between model formulation and sampling strategy on MCMC efficiency. There is no one‐size‐fits‐all best strategy, but rather problem‐specific best strategies related to model structure and type. In all but the simplest cases, NIMBLE's default or customized performance achieves much higher efficiency than JAGS. In the two most complex examples, NIMBLE was 10–12 times more efficient than JAGS. We find NIMBLE is a valuable tool for many ecologists utilizing Bayesian inference, particularly for complex models where JAGS is prohibitively slow. Our results highlight the need for more guidelines and customizable approaches to fit hierarchical models to ensure practitioners can make the most of occupancy and other hierarchical models. By implementing model‐generic MCMC procedures in open‐source software, including the NIMBLE extensions for integrating over latent states (implemented in the R package nimbleEcology), we have made progress toward this aim.  相似文献   

13.
Li Z  Sillanpää MJ 《Genetics》2012,190(1):231-249
Bayesian hierarchical shrinkage methods have been widely used for quantitative trait locus mapping. From the computational perspective, the application of the Markov chain Monte Carlo (MCMC) method is not optimal for high-dimensional problems such as the ones arising in epistatic analysis. Maximum a posteriori (MAP) estimation can be a faster alternative, but it usually produces only point estimates without providing any measures of uncertainty (i.e., interval estimates). The variational Bayes method, stemming from the mean field theory in theoretical physics, is regarded as a compromise between MAP and MCMC estimation, which can be efficiently computed and produces the uncertainty measures of the estimates. Furthermore, variational Bayes methods can be regarded as the extension of traditional expectation-maximization (EM) algorithms and can be applied to a broader class of Bayesian models. Thus, the use of variational Bayes algorithms based on three hierarchical shrinkage models including Bayesian adaptive shrinkage, Bayesian LASSO, and extended Bayesian LASSO is proposed here. These methods performed generally well and were found to be highly competitive with their MCMC counterparts in our example analyses. The use of posterior credible intervals and permutation tests are considered for decision making between quantitative trait loci (QTL) and non-QTL. The performance of the presented models is also compared with R/qtlbim and R/BhGLM packages, using a previously studied simulated public epistatic data set.  相似文献   

14.
15.
Hierarchical generative models, such as Bayesian networks, and belief propagation have been shown to provide a theoretical framework that can account for perceptual processes, including feedforward recognition and feedback modulation. The framework explains both psychophysical and physiological experimental data and maps well onto the hierarchical distributed cortical anatomy. However, the complexity required to model cortical processes makes inference, even using approximate methods, very computationally expensive. Thus, existing object perception models based on this approach are typically limited to tree-structured networks with no loops, use small toy examples or fail to account for certain perceptual aspects such as invariance to transformations or feedback reconstruction. In this study we develop a Bayesian network with an architecture similar to that of HMAX, a biologically-inspired hierarchical model of object recognition, and use loopy belief propagation to approximate the model operations (selectivity and invariance). Crucially, the resulting Bayesian network extends the functionality of HMAX by including top-down recursive feedback. Thus, the proposed model not only achieves successful feedforward recognition invariant to noise, occlusions, and changes in position and size, but is also able to reproduce modulatory effects such as illusory contour completion and attention. Our novel and rigorous methodology covers key aspects such as learning using a layerwise greedy algorithm, combining feedback information from multiple parents and reducing the number of operations required. Overall, this work extends an established model of object recognition to include high-level feedback modulation, based on state-of-the-art probabilistic approaches. The methodology employed, consistent with evidence from the visual cortex, can be potentially generalized to build models of hierarchical perceptual organization that include top-down and bottom-up interactions, for example, in other sensory modalities.  相似文献   

16.
SUM: a new way to incorporate mismatch probe measurements   总被引:3,自引:0,他引:3  
Huang S  Wang Y  Chen P  Qian HR  Yeo A  Bemis K 《Genomics》2004,84(4):767-777
  相似文献   

17.
18.
19.
In this paper, we study Bayesian analysis of nonlinear hierarchical mixture models with a finite but unknown number of components. Our approach is based on Markov chain Monte Carlo (MCMC) methods. One of the applications of our method is directed to the clustering problem in gene expression analysis. From a mathematical and statistical point of view, we discuss the following topics: theoretical and practical convergence problems of the MCMC method; determination of the number of components in the mixture; and computational problems associated with likelihood calculations. In the existing literature, these problems have mainly been addressed in the linear case. One of the main contributions of this paper is developing a method for the nonlinear case. Our approach is based on a combination of methods including Gibbs sampling, random permutation sampling, birth-death MCMC, and Kullback-Leibler distance.  相似文献   

20.
MOTIVATION: Affymetrix GeneChips are common 3' profiling platforms for quantifying gene expression. Using publicly available datasets of expression profiles from human and mouse experiments, we sought to characterize features of GeneChip data to better compare and evaluate analyses for differential expression, regulation and clustering. We uncovered an unexpected order dependence in expression data that holds across a variety of chips in both human and mouse data. RESULTS: Order dependence among GeneChips affected relative expression measures pre-processed and normalized with the Affymetrix MAS5.0 algorithm and the robust multi-array average summarization method. The effect strongly influenced detection calls and tests for differential expression and can potentially significantly bias experimental results based on GeneChip profiling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号