首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary In a microarray experiment, one experimental design is used to obtain expression measures for all genes. One popular analysis method involves fitting the same linear mixed model for each gene, obtaining gene‐specific p‐values for tests of interest involving fixed effects, and then choosing a threshold for significance that is intended to control false discovery rate (FDR) at a desired level. When one or more random factors have zero variance components for some genes, the standard practice of fitting the same full linear mixed model for all genes can result in failure to control FDR. We propose a new method that combines results from the fit of full and selected linear mixed models to identify differentially expressed genes and provide FDR control at target levels when the true underlying random effects structure varies across genes.  相似文献   

2.
Microarrays provide a valuable tool for the quantification of gene expression. Usually, however, there is a limited number of replicates leading to unsatisfying variance estimates in a gene‐wise mixed model analysis. As thousands of genes are available, it is desirable to combine information across genes. When more than two tissue types or treatments are to be compared it might be advisable to consider the array effect as random. Then information between arrays may be recovered, which can increase accuracy in estimation. We propose a method of variance component estimation across genes for a linear mixed model with two random effects. The method may be extended to models with more than two random effects. We assume that the variance components follow a log‐normal distribution. Assuming that the sums of squares from the gene‐wise analysis, given the true variance components, follow a scaled χ2‐distribution, we adopt an empirical Bayes approach. The variance components are estimated by the expectation of their posterior distribution. The new method is evaluated in a simulation study. Differentially expressed genes are more likely to be detected by tests based on these variance estimates than by tests based on gene‐wise variance estimates. This effect is most visible in studies with small array numbers. Analyzing a real data set on maize endosperm the method is shown to work well. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

3.
The augmentation of categorical outcomes with underlying Gaussian variables in bivariate generalized mixed effects models has facilitated the joint modeling of continuous and binary response variables. These models typically assume that random effects and residual effects (co)variances are homogeneous across all clusters and subjects, respectively. Motivated by conflicting evidence about the association between performance outcomes in dairy production systems, we consider the situation where these (co)variance parameters may themselves be functions of systematic and/or random effects. We present a hierarchical Bayesian extension of bivariate generalized linear models whereby functions of the (co)variance matrices are specified as linear combinations of fixed and random effects following a square‐root‐free Cholesky reparameterization that ensures necessary positive semidefinite constraints. We test the proposed model by simulation and apply it to the analysis of a dairy cattle data set in which the random herd‐level and residual cow‐level effects (co)variances between a continuous production trait and binary reproduction trait are modeled as functions of fixed management effects and random cluster effects.  相似文献   

4.
5.
Kinney SK  Dunson DB 《Biometrics》2007,63(3):690-698
We address the problem of selecting which variables should be included in the fixed and random components of logistic mixed effects models for correlated data. A fully Bayesian variable selection is implemented using a stochastic search Gibbs sampler to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed effect coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process. Default priors are proposed for the variance components and an efficient parameter expansion Gibbs sampler is developed for posterior computation. The approach is illustrated using simulated data and an epidemiologic example.  相似文献   

6.
A mixed-model procedure for analysis of censored data assuming a multivariate normal distribution is described. A Bayesian framework is adopted which allows for estimation of fixed effects and variance components and prediction of random effects when records are left-censored. The procedure can be extended to right- and two-tailed censoring. The model employed is a generalized linear model, and the estimation equations resemble those arising in analysis of multivariate normal or categorical data with threshold models. Estimates of variance components are obtained using expressions similar to those employed in the EM algorithm for restricted maximum likelihood (REML) estimation under normality.  相似文献   

7.
Summary It is of great practical interest to simultaneously identify the important predictors that correspond to both the fixed and random effects components in a linear mixed‐effects (LME) model. Typical approaches perform selection separately on each of the fixed and random effect components. However, changing the structure of one set of effects can lead to different choices of variables for the other set of effects. We propose simultaneous selection of the fixed and random factors in an LME model using a modified Cholesky decomposition. Our method is based on a penalized joint log likelihood with an adaptive penalty for the selection and estimation of both the fixed and random effects. It performs model selection by allowing fixed effects or standard deviations of random effects to be exactly zero. A constrained expectation–maximization algorithm is then used to obtain the final estimates. It is further shown that the proposed penalized estimator enjoys the Oracle property, in that, asymptotically it performs as well as if the true model was known beforehand. We demonstrate the performance of our method based on a simulation study and a real data example.  相似文献   

8.
A class of generalized linear mixed models can be obtained by introducing random effects in the linear predictor of a generalized linear model, e.g. a split plot model for binary data or count data. Maximum likelihood estimation, for normally distributed random effects, involves high-dimensional numerical integration, with severe limitations on the number and structure of the additional random effects. An alternative estimation procedure based on an extension of the iterative re-weighted least squares procedure for generalized linear models will be illustrated on a practical data set involving carcass classification of cattle. The data is analysed as overdispersed binomial proportions with fixed and random effects and associated components of variance on the logit scale. Estimates are obtained with standard software for normal data mixed models. Numerical restrictions pertain to the size of matrices to be inverted. This can be dealt with by absorption techniques familiar from e.g. mixed models in animal breeding. The final model fitted to the classification data includes four components of variance and a multiplicative overdispersion factor. Basically the estimation procedure is a combination of iterated least squares procedures and no full distributional assumptions are needed. A simulation study based on the classification data is presented. This includes a study of procedures for constructing confidence intervals and significance tests for fixed effects and components of variance. The simulation results increase confidence in the usefulness of the estimation procedure.  相似文献   

9.
Estimation of variance components in linear mixed models is important in clinical trial and longitudinal data analysis. It is also important in animal and plant breeding for accurately partitioning total phenotypic variance into genetic and environmental variances. Restricted maximum likelihood (REML) method is often preferred over the maximum likelihood (ML) method for variance component estimation because REML takes into account the lost degree of freedom resulting from estimating the fixed effects. The original restricted likelihood function involves a linear transformation of the original response variable (a collection of error contrasts). Harville's final form of the restricted likelihood function does not involve the transformation and thus is much easier to manipulate than the original restricted likelihood function. There are several different ways to show that the two forms of the restricted likelihood are equivalent. In this study, I present a much simpler way to derive Harville's restricted likelihood function. I first treat the fixed effects as random effects and call such a mixed model a pseudo random model (PDRM). I then construct a likelihood function for the PDRM. Finally, I let the variance of the pseudo random effects be infinity and show that the limit of the likelihood function of the PDRM is the restricted likelihood function.  相似文献   

10.
Identifying differential expressed genes across various conditions or genotypes is the most typical approach to studying the regulation of gene expression. An estimate of gene-specific variance is often needed for the assessment of statistical significance in most differential expression (DE) detection methods, including linear models (e.g., for transformed and normalized microarray data) and generalized linear models (e.g., for count data in RNAseq). Due to a common limit in sample size, the variance estimate is often unstable in small experiments. Shrinkage estimates using empirical Bayes methods have proven useful in improving the variance estimate, hence improving the detection of DE. The most widely used empirical Bayes methods borrow information across genes within the same experiments. In these methods, genes are considered exchangeable or exchangeable conditioning on expression level. We propose, with the increasing accumulation of expression data, borrowing information from historical data on the same gene can provide better estimate of gene-specific variance, thus further improve DE detection. Specifically, we show that the variation of gene expression is truly gene-specific and reproducible between different experiments. We present a new method to establish informative gene-specific prior on the variance of expression using existing public data, and illustrate how to shrink the variance estimate and detect DE. We demonstrate improvement in DE detection under our strategy compared to leading DE detection methods.  相似文献   

11.
Liu D  Lin X  Ghosh D 《Biometrics》2007,63(4):1079-1088
We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations.  相似文献   

12.
Combining information across genes in the statistical analysis of microarray data is desirable because of the relatively small number of data points obtained for each individual gene. Here we develop an estimator of the error variance that can borrow information across genes using the James-Stein shrinkage concept. A new test statistic (FS) is constructed using this estimator. The new statistic is compared with other statistics used to test for differential expression: the gene-specific F test (F1), the pooled-variance F statistic (F3), a hybrid statistic (F2) that uses the average of the individual and pooled variances, the regularized t-statistic, the posterior odds statistic B, and the SAM t-test. The FS-test shows best or nearly best power for detecting differentially expressed genes over a wide range of simulated data in which the variance components associated with individual genes are either homogeneous or heterogeneous. Thus FS provides a powerful and robust approach to test differential expression of genes that utilizes information not available in individual gene testing approaches and does not suffer from biases of the pooled variance approach.  相似文献   

13.
A procedure is presented for constructing an exact confidence interval for the ratio of the two variance components in a possibly unbalanced mixed linear model that contains a single set of m random effects. This procedure can be used in animal and plant breeding problems to obtain an exact confidence interval for a heritability. The confidence interval can be defined in terms of the output of a least squares analysis. It can be computed by a graphical or iterative technique requiring the diagonalization of an m X m matrix or, alternatively, the inversion of a number of m X m matrices. Confidence intervals that are approximate can be obtained with much less computational burden, using either of two approaches. The various confidence interval procedures can be extended to some problems in which the mixed linear model contains more than one set of random effects. Corresponding to each interval procedure is a significance test and one or more estimators.  相似文献   

14.
Summary .   For longitudinal data, mixed models include random subject effects to indicate how subjects influence their responses over repeated assessments. The error variance and the variance of the random effects are usually considered to be homogeneous. These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data. In studies using ecological momentary assessment (EMA), up to 30 or 40 observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within and between subjects. In this article, we focus on an adolescent smoking study using EMA where interest is on characterizing changes in mood variation. We describe how covariates can influence the mood variances, and also extend the standard mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, we allow the location and scale random effects to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure.  相似文献   

15.
A popular way to represent clustered binary, count, or other data is via the generalized linear mixed model framework, which accommodates correlation through incorporation of random effects. A standard assumption is that the random effects follow a parametric family such as the normal distribution; however, this may be unrealistic or too restrictive to represent the data. We relax this assumption and require only that the distribution of random effects belong to a class of 'smooth' densities and approximate the density by the seminonparametric (SNP) approach of Gallant and Nychka (1987). This representation allows the density to be skewed, multi-modal, fat- or thin-tailed relative to the normal and includes the normal as a special case. Because an efficient algorithm to sample from an SNP density is available, we propose a Monte Carlo EM algorithm using a rejection sampling scheme to estimate the fixed parameters of the linear predictor, variance components and the SNP density. The approach is illustrated by application to a data set and via simulation.  相似文献   

16.
Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.  相似文献   

17.
Evaluation of the gene-specific dye bias in cDNA microarray experiments   总被引:2,自引:0,他引:2  
MOTIVATION: In cDNA microarray experiments all samples are labeled with either Cy3 or Cy5. Systematic and gene-specific dye bias effects have been observed in dual-color experiments. In contrast to systematic effects which can be corrected by a normalization method, the gene-specific dye bias is not completely suppressed and may alter the conclusions about the differentially expressed genes. METHODS: The gene-specific dye bias is taken into account using an analysis of variance model. We propose an index, named label bias index, to measure the gene-specific dye bias. It requires at least two self-self hybridization cDNA microarrays. RESULTS: After lowess normalization we have found that the gene-specific dye bias is the major source of experimental variability between replicates. The ratio (R/G) may exceed 2. As a consequence false positive genes may be found in direct comparison without dye-swap. The stability of this artifact and its consequences on gene variance and on direct or indirect comparisons are addressed. AVAILABILITY: http://www.inapg.inra.fr/ens_rech/mathinfo/recherche/mathematique  相似文献   

18.
Yuanjia Wang  Huaihou Chen 《Biometrics》2012,68(4):1113-1125
Summary We examine a generalized F ‐test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying‐coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two‐way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F ‐test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome‐wide critical value and p ‐value of a genetic association test in a genome‐wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 108 simulations) and asymptotic approximation may be unreliable and conservative.  相似文献   

19.
The traditional method for estimating the linear function of fixed parameters in mixed linear model is a two-stage procedure. In the first stage of this procedure the variance components estimators are calculated and next in the second stage these estimators are taken as true values of variance components to estimating the linear function of fixed parameters according to generalized least squares method. In this paper the general mixed linear model is considered in which a matrix related to fixed parameters and or/a dispersion matrix of observation vector may be deficient in rank. It is shown that the estimators of a set of functions of fixed parameters obtained in second stage are unbiased if only the observation vector is symmetrically distributed about its expected value and the estimators of variance components from first stage are translation-invariant and are even functions of the observation vector.  相似文献   

20.
We propose a general Bayesian approach to heteroskedastic error modeling for generalized linear mixed models (GLMM) in which linked functions of conditional means and residual variances are specified as separate linear combinations of fixed and random effects. We focus on the linear mixed model (LMM) analysis of birth weight (BW) and the cumulative probit mixed model (CPMM) analysis of calving ease (CE). The deviance information criterion (DIC) was demonstrated to be useful in correctly choosing between homoskedastic and heteroskedastic error GLMM for both traits when data was generated according to a mixed model specification for both location parameters and residual variances. Heteroskedastic error LMM and CPMM were fitted, respectively, to BW and CE data on 8847 Italian Piemontese first parity dams in which residual variances were modeled as functions of fixed calf sex and random herd effects. The posterior mean residual variance for male calves was over 40% greater than that for female calves for both traits. Also, the posterior means of the standard deviation of the herd-specific variance ratios (relative to a unitary baseline) were estimated to be 0.60 ± 0.09 for BW and 0.74 ± 0.14 for CE. For both traits, the heteroskedastic error LMM and CPMM were chosen over their homoskedastic error counterparts based on DIC values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号