首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Currently, linear mixed model analyses of expression microarray experiments are performed either in a gene-specific or global mode. The joint analysis provides more flexibility in terms of how parameters are fitted and estimated and tends to be more powerful than the gene-specific analysis. Here we show how to implement the gene-specific linear mixed model analysis as an exact algorithm for the joint linear mixed model analysis. The gene-specific algorithm is exact, when the mixed model equations can be partitioned into unrelated components: One for all global fixed and random effects and the others for the gene-specific fixed and random effects for each gene separately. This unrelatedness holds under three conditions: (1) any gene must have the same number of replicates or probes on all arrays, but these numbers can differ among genes; (2) the residual variance of the (transformed) expression data must be homogeneous or constant across genes (other variance components need not be homogeneous) and (3) the number of genes in the experiment is large. When these conditions are violated, the gene-specific algorithm is expected to be nearly exact.  相似文献   

2.
Microarrays provide a valuable tool for the quantification of gene expression. Usually, however, there is a limited number of replicates leading to unsatisfying variance estimates in a gene‐wise mixed model analysis. As thousands of genes are available, it is desirable to combine information across genes. When more than two tissue types or treatments are to be compared it might be advisable to consider the array effect as random. Then information between arrays may be recovered, which can increase accuracy in estimation. We propose a method of variance component estimation across genes for a linear mixed model with two random effects. The method may be extended to models with more than two random effects. We assume that the variance components follow a log‐normal distribution. Assuming that the sums of squares from the gene‐wise analysis, given the true variance components, follow a scaled χ2‐distribution, we adopt an empirical Bayes approach. The variance components are estimated by the expectation of their posterior distribution. The new method is evaluated in a simulation study. Differentially expressed genes are more likely to be detected by tests based on these variance estimates than by tests based on gene‐wise variance estimates. This effect is most visible in studies with small array numbers. Analyzing a real data set on maize endosperm the method is shown to work well. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

3.
Summary Microarray gene expression studies over ordered categories are routinely conducted to gain insights into biological functions of genes and the underlying biological processes. Some common experiments are time‐course/dose‐response experiments where a tissue or cell line is exposed to different doses and/or durations of time to a chemical. A goal of such studies is to identify gene expression patterns/profiles over the ordered categories. This problem can be formulated as a multiple testing problem where for each gene the null hypothesis of no difference between the successive mean gene expressions is tested and further directional decisions are made if it is rejected. Much of the existing multiple testing procedures are devised for controlling the usual false discovery rate (FDR) rather than the mixed directional FDR (mdFDR), the expected proportion of Type I and directional errors among all rejections. Benjamini and Yekutieli (2005, Journal of the American Statistical Association 100, 71–93) proved that an augmentation of the usual Benjamini–Hochberg (BH) procedure can control the mdFDR while testing simple null hypotheses against two‐sided alternatives in terms of one‐dimensional parameters. In this article, we consider the problem of controlling the mdFDR involving multidimensional parameters. To deal with this problem, we develop a procedure extending that of Benjamini and Yekutieli based on the Bonferroni test for each gene. A proof is given for its mdFDR control when the underlying test statistics are independent across the genes. The results of a simulation study evaluating its performance under independence as well as under dependence of the underlying test statistics across the genes relative to other relevant procedures are reported. Finally, the proposed methodology is applied to a time‐course microarray data obtained by Lobenhofer et al. (2002, Molecular Endocrinology 16, 1215–1229). We identified several important cell‐cycle genes, such as DNA replication/repair gene MCM4 and replication factor subunit C2, which were not identified by the previous analyses of the same data by Lobenhofer et al. (2002) and Peddada et al. (2003, Bioinformatics 19, 834–841). Although some of our findings overlap with previous findings, we identify several other genes that complement the results of Lobenhofer et al. (2002) .  相似文献   

4.
We performed a genome‐wide association study using the porcine 60K SNP array to detect QTL regions for nine traits in a three‐generational Duroc samples (n = 651), viz. generations 1, 2 and 3 from a population selected over five generations using a closed nucleus breeding scheme. We applied a linear mixed model for association mapping to detect SNP effects, adjusting for fixed effects (sex and season) and random polygenic effects (reflecting genetic relatedness), and derived a likelihood ratio statistic for each SNP using the efficient mixed‐model association method. We detected a region on SSC6 for backfat thickness (BFT) and on SSC7 for cannon bone circumference (CANNON), with a genome‐wide significance of < 0.01 after Bonferroni correction. These regions had been detected previously in other pig populations. Six genes are located in the BFT‐associated region, while the CANNON‐associated region includes 66 genes. In the future, significantly associated SNPs, derived by sequencing the coding regions of the six genes in the BFT region, can be used in marker‐assisted selection of BFT, whereas haplotypes constructed from the SSC7 region with strong LD can be used to select for the CANNON trait in our resource family.  相似文献   

5.
Existing methods for joint modeling of longitudinal measurements and survival data can be highly influenced by outliers in the longitudinal outcome. We propose a joint model for analysis of longitudinal measurements and competing risks failure time data which is robust in the presence of outlying longitudinal observations during follow‐up. Our model consists of a linear mixed effects sub‐model for the longitudinal outcome and a proportional cause‐specific hazards frailty sub‐model for the competing risks data, linked together by latent random effects. Instead of the usual normality assumption for measurement errors in the linear mixed effects sub‐model, we adopt a t ‐distribution which has a longer tail and thus is more robust to outliers. We derive an EM algorithm for the maximum likelihood estimates of the parameters and estimate their standard errors using a profile likelihood method. The proposed method is evaluated by simulation studies and is applied to a scleroderma lung study (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

6.
Missing outcomes or irregularly timed multivariate longitudinal data frequently occur in clinical trials or biomedical studies. The multivariate t linear mixed model (MtLMM) has been shown to be a robust approach to modeling multioutcome continuous repeated measures in the presence of outliers or heavy‐tailed noises. This paper presents a framework for fitting the MtLMM with an arbitrary missing data pattern embodied within multiple outcome variables recorded at irregular occasions. To address the serial correlation among the within‐subject errors, a damped exponential correlation structure is considered in the model. Under the missing at random mechanism, an efficient alternating expectation‐conditional maximization (AECM) algorithm is used to carry out estimation of parameters and imputation of missing values. The techniques for the estimation of random effects and the prediction of future responses are also investigated. Applications to an HIV‐AIDS study and a pregnancy study involving analysis of multivariate longitudinal data with missing outcomes as well as a simulation study have highlighted the superiority of MtLMMs on the provision of more adequate estimation, imputation and prediction performances.  相似文献   

7.
8.
A genome‐wide association study of 2098 progeny‐tested Nordic Holstein bulls genotyped for 36 387 SNPs on 29 autosomes was conducted to confirm and fine‐map quantitative trait loci (QTL) for mastitis traits identified earlier using linkage analysis with sparse microsatellite markers in the same population. We used linear mixed model analysis where a polygenic genetic effect was fitted as a random effect and single SNPs were successively included as fixed effects in the model. We detected 143 SNP‐by‐trait significant associations (P < 0.0001) on 20 chromosomes affecting mastitis‐related traits. Among them, 21 SNP‐by‐trait combinations exceeded the genome‐wide significant threshold. For 12 chromosomes, both the present association study and the previous linkage study detected QTL, and of these, six were in the same chromosomal locations. Strong associations of SNPs with mastitis traits were observed on bovine autosomes 6, 13, 14 and 20. Possible candidate genes for these QTL were identified. Identification of SNPs in linkage disequilibrium with QTL will enable marker‐based selection for mastitis resistance. The candidate genes identified should be further studied to detect candidate polymorphisms underlying these QTL.  相似文献   

9.
10.
The problem of variable selection in the generalized linear‐mixed models (GLMMs) is pervasive in statistical practice. For the purpose of variable selection, many methodologies for determining the best subset of explanatory variables currently exist according to the model complexity and differences between applications. In this paper, we develop a “higher posterior probability model with bootstrap” (HPMB) approach to select explanatory variables without fitting all possible GLMMs involving a small or moderate number of explanatory variables. Furthermore, to save computational load, we propose an efficient approximation approach with Laplace's method and Taylor's expansion to approximate intractable integrals in GLMMs. Simulation studies and an application of HapMap data provide evidence that this selection approach is computationally feasible and reliable for exploring true candidate genes and gene–gene associations, after adjusting for complex structures among clusters.  相似文献   

11.
MOTIVATION: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when the number of tested genes gets large. Correlation between the test statistics attributed to gene co-regulation and dependency in the measurement errors of the gene expression levels further complicates the problem. In this paper we address this very large multiplicity problem by adopting the false discovery rate (FDR) controlling approach. In order to address the dependency problem, we present three resampling-based FDR controlling procedures, that account for the test statistics distribution, and compare their performance to that of the na?ve application of the linear step-up procedure in Benjamini and Hochberg (1995). The procedures are studied using simulated microarray data, and their performance is examined relative to their ease of implementation. RESULTS: Comparative simulation analysis shows that all four FDR controlling procedures control the FDR at the desired level, and retain substantially more power then the family-wise error rate controlling procedures. In terms of power, using resampling of the marginal distribution of each test statistics substantially improves the performance over the na?ve one. The highest power is achieved, at the expense of a more sophisticated algorithm, by the resampling-based procedures that resample the joint distribution of the test statistics and estimate the level of FDR control. AVAILABILITY: An R program that adjusts p-values using FDR controlling procedures is freely available over the Internet at www.math.tau.ac.il/~ybenja.  相似文献   

12.
Matsui S  Noma H 《Biometrics》2011,67(4):1225-1235
Summary In microarray screening for differentially expressed genes using multiple testing, assessment of power or sample size is of particular importance to ensure that few relevant genes are removed from further consideration prematurely. In this assessment, adequate estimation of the effect sizes of differentially expressed genes is crucial because of its substantial impact on power and sample‐size estimates. However, conventional methods using top genes with largest observed effect sizes would be subject to overestimation due to random variation. In this article, we propose a simple estimation method based on hierarchical mixture models with a nonparametric prior distribution to accommodate random variation and possible large diversity of effect sizes across differential genes, separated from nuisance, nondifferential genes. Based on empirical Bayes estimates of effect sizes, the power and false discovery rate (FDR) can be estimated to monitor them simultaneously in gene screening. We also propose a power index that concerns selection of top genes with largest effect sizes, called partial power. This new power index could provide a practical compromise for the difficulty in achieving high levels of usual overall power as confronted in many microarray experiments. Applications to two real datasets from cancer clinical studies are provided.  相似文献   

13.
A method is proposed that aims at identifying clusters of individuals that show similar patterns when observed repeatedly. We consider linear‐mixed models that are widely used for the modeling of longitudinal data. In contrast to the classical assumption of a normal distribution for the random effects a finite mixture of normal distributions is assumed. Typically, the number of mixture components is unknown and has to be chosen, ideally by data driven tools. For this purpose, an EM algorithm‐based approach is considered that uses a penalized normal mixture as random effects distribution. The penalty term shrinks the pairwise distances of cluster centers based on the group lasso and the fused lasso method. The effect is that individuals with similar time trends are merged into the same cluster. The strength of regularization is determined by one penalization parameter. For finding the optimal penalization parameter a new model choice criterion is proposed.  相似文献   

14.
We analyzed global patterns of expression in genes related to glutamatergic neurotransmission (glutamatergic genes) in healthy human adult brain before determining the effects of chronic alcohol and cocaine exposure on gene expression in the hippocampus. RNA‐Seq data from ‘BrainSpan’ was obtained across 16 brain regions from nine control adults. We also generated RNA‐Seq data from postmortem hippocampus from eight alcoholics, eight cocaine addicts and eight controls. Expression analyses were undertaken of 28 genes encoding glutamate ionotropic (AMPA, kainate, NMDA) and metabotropic receptor subunits, together with glutamate transporters. The expression of each gene was fairly consistent across the brain with the exception of the cerebellum, the thalamic mediodorsal nucleus and the striatum. GRIN1, encoding the essential NMDA subunit, had the highest expression across all brain regions. Six factors accounted for 84% of the variance in global gene expression. GRIN2B (encoding GluN2B), was up‐regulated in both alcoholics and cocaine addicts (FDR corrected P = 0.008). Alcoholics showed up‐regulation of three genes relative to controls and cocaine addicts: GRIA4 (encoding GluA4), GRIK3 (GluR7) and GRM4 (mGluR4). Expression of both GRM3 (mGluR3) and GRIN2D (GluN2D) was up‐regulated in alcoholics and down‐regulated in cocaine addicts relative to controls. Glutamatergic genes are moderately to highly expressed throughout the brain. Six factors explain nearly all the variance in global gene expression. At least in the hippocampus, chronic alcohol use largely up‐regulates glutamatergic genes. The NMDA GluN2B receptor subunit might be implicated in a common pathway to addiction, possibly in conjunction with the GABAB1 receptor subunit.  相似文献   

15.
Lin X  Ryan L  Sammel M  Zhang D  Padungtod C  Xu X 《Biometrics》2000,56(2):593-601
We propose a scaled linear mixed model to assess the effects of exposure and other covariates on multiple continuous outcomes. The most general form of the model allows a different exposure effect for each outcome. An important special case is a model that represents the exposure effects using a common global measure that can be characterized in terms of effect sizes. Correlations among different outcomes within the same subject are accommodated using random effects. We develop two approaches to model fitting, including the maximum likelihood method and the working parameter method. A key feature of both methods is that they can be easily implemented by repeatedly calling software for fitting standard linear mixed models, e.g., SAS PROC MIXED. Compared to the maximum likelihood method, the working parameter method is easier to implement and yields fully efficient estimators of the parameters of interest. We illustrate the proposed methods by analyzing data from a study of the effects of occupational pesticide exposure on semen quality in a cohort of Chinese men.  相似文献   

16.
Sclerotinia sclerotiorum is a serious pathogen of numerous crops around the world. The major virulence factor of this pathogen is oxalic acid (OA). Mutants that cannot produce OA do not cause disease, and plants that express enzymes that degrade OA, such as oxalate oxidase (OxO), are very resistant to S. sclerotiorum. To examine the effect of OA on plants, we infiltrated soybean leaves with 5 mm OA and examined the gene expression changes at 2 h post‐infiltration. By comparing the gene expression levels between leaves of a transgenic soybean carrying an OxO gene (OxO) and its parent AC Colibri (AC) infiltrated with OA (pH 2.4) or water (pH 2.4 or 5.5), we were able to compare the effects of OA dependent or independent of its pH. Gene expression by microarray analysis identified 2390 genes that showed changes in expression, as determined using an overall F‐test P‐value cut‐off of 0.001. The additional requirement that at least one pairwise t‐test false discovery rate (FDR)‐corrected P value should be less than 0.001 reduced the list of the most highly significant differentially expressed genes to 1054. Independent of pH, OA altered the expression levels of 78 genes, with ferritin showing the strongest induction by OA. The combination of OA plus its low pH caused 1045 genes (99% of all significant genes) to be differentially expressed, with many of the up‐regulated genes being related to basal defence, such as genes of the phenylpropanoid pathway and various cytochrome P450s. RNA‐seq was also conducted on four samples: OxO and AC genotypes infiltrated with either OA pH 2.4 or water pH 2.4. The RNA‐seq analysis also identified ferritin paralogues as being strongly induced by OA. As the expression of ferritin, a gene that encodes for an iron storage protein, is induced by free iron, these results suggest that S. sclerotiorum benefits from the ability of OA to free iron from plant proteins, as this induces host cell death, and also allows the uptake and assimilation of the iron for its own metabolic needs.  相似文献   

17.
Historically, sheep have been selectively bred for desirable traits including wool characteristics. However, recent moves towards extensive farming and reduced farm labour have seen a renewed interest in Easycare breeds. The aim of this study was to quantify the underlying genetic architecture of wool shedding in an Easycare flock. Wool shedding scores were collected from 565 pedigreed commercial Easycare sheep from 2002 to 2010. The wool scoring system was based on a 10‐point (0–9) scale, with score 0 for animals retaining full fleece and 9 for those completely shedding. DNA was sampled from 200 animals of which 48 with extreme phenotypes were genotyped using a 50‐k SNP chip. Three genetic analyses were performed: heritability analysis, complex segregation analysis to test for a major gene hypothesis and a genome‐wide association study to map regions in the genome affecting the trait. Phenotypes were treated as a continuous or binary variable and categories. High estimates of heritability (0.80 when treated as a continuous, 0.65–0.75 as binary and 0.75 as categories) for shedding were obtained from linear mixed model analyses. Complex segregation analysis gave similar estimates (0.80 ± 0.06) to those above with additional evidence for a major gene with dominance effects. Mixed model association analyses identified four significant (< 0.05) SNPs. Further analyses of these four SNPs in all 200 animals revealed that one of the SNPs displayed dominance effects similar to those obtained from the complex segregation analyses. In summary, we found strong genetic control for wool shedding, demonstrated the possibility of a single putative dominant gene controlling this trait and identified four SNPs that may be in partial linkage disequilibrium with gene(s) controlling shedding.  相似文献   

18.
MicroRNAs (miRNAs) regulate gene expression with emerging data suggesting miRNAs play a role in skeletal muscle biology. We sought to examine the association of miRNAs with grip strength in a community‐based sample. Framingham Heart Study Offspring and Generation 3 participants (n = 5668 54% women, mean age 55 years, range 24, 90 years) underwent grip strength measurement and miRNA profiling using whole blood from fasting morning samples. Linear mixed‐effects regression modeling of grip strength (kg) versus continuous miRNA ‘Cq’ values and versus binary miRNA expression was performed. We conducted an integrative miRNA–mRNA coexpression analysis and examined the enrichment of biologic pathways for the top miRNAs associated with grip strength. Grip strength was lower in women than in men and declined with age with a mean 44.7 (10.0) kg in men and 26.5 (6.3) kg in women. Among 299 miRNAs interrogated for association with grip strength, 93 (31%) had FDR q value < 0.05, 54 (18%) had an FDR q value < 0.01, and 15 (5%) had FDR q value < 0.001. For almost all miRNA–grip strength associations, increasing miRNA concentration is associated with increasing grip strength. miR‐20a‐5p (FDR q 1.8 × 10?6) had the most significant association and several among the top 15 miRNAs had links to skeletal muscle including miR‐126‐3p, miR‐30a‐5p, and miR‐30d‐5p. The top associated biologic pathways included metabolism, chemokine signaling, and ubiquitin‐mediated proteolysis. Our comprehensive assessment in a community‐based sample of miRNAs in blood associated with grip strength provides a framework to further our understanding of the biology of muscle strength.  相似文献   

19.

Background  

In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR) and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available.  相似文献   

20.
This paper presents an extension of the joint modeling strategy for the case of multiple longitudinal outcomes and repeated infections of different types over time, motivated by postkidney transplantation data. Our model comprises two parts linked by shared latent terms. On the one hand is a multivariate mixed linear model with random effects, where a low‐rank thin‐plate spline function is incorporated to collect the nonlinear behavior of the different profiles over time. On the other hand is an infection‐specific Cox model, where the dependence between different types of infections and the related times of infection is through a random effect associated with each infection type to catch the within dependence and a shared frailty parameter to capture the dependence between infection types. We implemented the parameterization used in joint models which uses the fitted longitudinal measurements as time‐dependent covariates in a relative risk model. Our proposed model was implemented in OpenBUGS using the MCMC approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号