共查询到20条相似文献,搜索用时 7 毫秒
1.
2.
Hierarchical Bayes models for cDNA microarray gene expression 总被引:2,自引:0,他引:2
cDNA microarrays are used in many contexts to compare mRNA levels between samples of cells. Microarray experiments typically give us expression measurements on 1000-20 000 genes, but with few replicates for each gene. Traditional methods using means and standard deviations to detect differential expression are not satisfactory in this context. A handful of alternative statistics have been developed, including several empirical Bayes methods. In the present paper we present two full hierarchical Bayes models for detecting gene expression, of which one (D) describes our microarray data very well. We also compare the full Bayes and empirical Bayes approaches with respect to model assumptions, false discovery rates and computer running time. The proposed models are compared to existing empirical Bayes models in a simulation study and for a set of data (Yuen et al., 2002), where 27 genes have been categorized by quantitative real-time PCR. It turns out that the existing empirical Bayes methods have at least as good performance as the full Bayes ones. 相似文献
3.
4.
5.
6.
7.
8.
On gene ranking using replicated microarray time course data 总被引:1,自引:0,他引:1
Summary . Consider the ranking of genes using data from replicated microarray time course experiments, where there are multiple biological conditions, and the genes of interest are those whose temporal profiles differ across conditions. We derive a multisample multivariate empirical Bayes' statistic for ranking genes in the order of differential expression, from both longitudinal and cross-sectional replicated developmental microarray time course data. Our longitudinal multisample model assumes that time course replicates are independent and identically distributed multivariate normal vectors. On the other hand, we construct a cross-sectional model using a normal regression framework with any appropriate basis for the design matrices. In both cases, we use natural conjugate priors in our empirical Bayes' setting which guarantee closed form solutions for the posterior odds. The simulations and two case studies using published worm and mouse microarray time course datasets indicate that the proposed approaches perform satisfactorily. 相似文献
9.
Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models 总被引:3,自引:0,他引:3
The problem of evaluating the goodness of the predictive distributionsof hierarchical Bayesian and empirical Bayes models is investigated.A Bayesian predictive information criterion is proposed as anestimator of the posterior mean of the expected loglikelihoodof the predictive distribution when the specified family ofprobability distributions does not contain the true distribution.The proposed criterion is developed by correcting the asymptoticbias of the posterior mean of the loglikelihood as an estimatorof its expected loglikelihood. In the evaluation of hierarchicalBayesian models with random effects, regardless of our parametricfocus, the proposed criterion considers the bias correctionof the posterior mean of the marginal loglikelihood becauseit requires a consistent parameter estimator. The use of thebootstrap in model evaluation is also discussed. 相似文献
10.
The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by "batch effects," the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples. 相似文献
11.
Edwards JW Page GP Gadbury G Heo M Kayo T Weindruch R Allison DB 《Functional & integrative genomics》2005,5(1):32-39
Micro-array technology allows investigators the opportunity to measure expression levels of thousands of genes simultaneously. However, investigators are also faced with the challenge of simultaneous estimation of gene expression differences for thousands of genes with very small sample sizes. Traditional estimators of differences between treatment means (ordinary least squares estimators or OLS) are not the best estimators if interest is in estimation of gene expression differences for an ensemble of genes. In the case that gene expression differences are regarded as exchangeable samples from a common population, estimators are available that result in much smaller average mean-square error across the population of gene expression difference estimates. We have simulated the application of such an estimator, namely an empirical Bayes (EB) estimator of random effects in a hierarchical linear model (normal-normal). Simulation results revealed mean-square error as low as 0.05 times the mean-square error of OLS estimators (i.e., the difference between treatment means). We applied the analysis to an example dataset as a demonstration of the shrinkage of EB estimators and of the reduction in mean-square error, i.e., increase in precision, associated with EB estimators in this analysis. The method described here is available in software that is available at . 相似文献
12.
A continuous empirical Bayes smoothing technique 总被引:1,自引:0,他引:1
13.
14.
15.
单细胞转录组测序(Single cell RNA sequencing,ScRNA seq)是一种变革性的生物技术,以前所未有的高分辨率来解析组织复杂性,解决了普通转录组测序(Bulk RNA sequencing)无法回答的问题。但单细胞数据的高通量及复杂性给分析带来极大难度,批次效应(Batch effects,BEs)的处理便是主要挑战之一。批次效应是高通量生物数据分析中的技术性偏倚,其来源及处理具有高复杂性和研究依赖性。根据组织类型、测序技术及实验设计的不同,测序数据需采用不同的评估、分析、测量及处置流程来实现有效的批次效应处理。评测批次效应在单细胞数据分析中极易被忽略,但却有助于判断批次效应的来源、对数据变异的解释度、对数据分析结果的影响度及处理方法,是有效处理批次效应的基础。因此,本篇综述聚焦单细胞转录组数据的批次效应,分别论述批次效应的概念、与普通转录组批次效应的区别、评测方法及面临的挑战,并对未来发展做出展望。 相似文献
16.
17.
18.
Zhongwei Zhang Reinaldo B. Arellano-Valle Marc G. Genton Raphaël Huser 《Biometrics》2023,79(3):1788-1800
Correlated binary response data with covariates are ubiquitous in longitudinal or spatial studies. Among the existing statistical models, the most well-known one for this type of data is the multivariate probit model, which uses a Gaussian link to model dependence at the latent level. However, a symmetric link may not be appropriate if the data are highly imbalanced. Here, we propose a multivariate skew-elliptical link model for correlated binary responses, which includes the multivariate probit model as a special case. Furthermore, we perform Bayesian inference for this new model and prove that the regression coefficients have a closed-form unified skew-elliptical posterior with an elliptical prior. The new methodology is illustrated by an application to COVID-19 data from three different counties of the state of California, USA. By jointly modeling extreme spikes in weekly new cases, our results show that the spatial dependence cannot be neglected. Furthermore, the results also show that the skewed latent structure of our proposed model improves the flexibility of the multivariate probit model and provides a better fit to our highly imbalanced dataset. 相似文献
19.
Tumor classification is a well-studied problem in the field of bioinformatics. Developments in the field of DNA chip design have now made it possible to measure the expression levels of thousands of genes in sample tissue from healthy cell lines or tumors. A number of studies have examined the problems of tumor classification: class discovery, the problem of defining a number of classes of tumors using the data from a DNA chip, and class prediction, the problem of accurately classifying an unknown tumor, given expression data from the unknown tumor and from a learning set. The current work has applied phylogenetic methods to both problems. To solve the class discovery problem, we impose a metric on a set of tumors as a function of their gene expression levels, and impose a tree structure on this metric, using standard tree fitting methods borrowed from the field of phylogenetics. Phylogenetic methods provide a simple way of imposing a clear hierarchical relationship on the data, with branch lengths in the classification tree representing the degree of separation witnessed. We tested our method for class discovery on two data sets: a data set of 87 tissues, comprised mostly of small, round, blue-cell tumors (SRBCTs), and a data set of 22 breast tumors. We fit the 87 samples of the first set to a classification tree, which neatly separated into four major clusters corresponding exactly to the four groups of tumors, namely neuroblastomas, rhabdomyosarcomas, Burkitt's lymphomas, and the Ewing's family of tumors. The classification tree built using the breast cancer data separated tumors with BRCA1 mutations from those with BRCA2 mutations, with sporadic tumors separated from both groups and from each other. We also demonstrate the flexibility of the class discovery method with regard to standard resampling methodology such as jackknifing and noise perturbation. To solve the class prediction problem, we built a classification tree on the learning set, and then sought the optimal placement of each test sample within the classification tree. We tested this method on the SRBCT data set, and classified each tumor successfully. 相似文献
20.
A fundamental issue in quantitative trait locus (QTL) mapping is to determine the plausibility of the presence of a QTL at a given genome location. Bayesian analysis offers an attractive way of testing alternative models (here, QTL vs. no-QTL) via the Bayes factor. There have been several numerical approaches to computing the Bayes factor, mostly based on Markov Chain Monte Carlo (MCMC), but these strategies are subject to numerical or stability problems. We propose a simple and stable approach to calculating the Bayes factor between nested models. The procedure is based on a reparameterization of a variance component model in terms of intra-class correlation. The Bayes factor can then be easily calculated from the output of a MCMC scheme by averaging conditional densities at the null intra-class correlation. We studied the performance of the method using simulation. We applied this approach to QTL analysis in an outbred population. We also compared it with the Likelihood Ratio Test and we analyzed its stability. Simulation results were very similar to the simulated parameters. The posterior probability of the QTL model increases as the QTL effect does. The location of the QTL was also correctly obtained. The use of meta-analysis is suggested from the properties of the Bayes factor. 相似文献