首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Pavlidis P  Noble WS 《Genome biology》2001,2(10):research0042.1-research004215

Background  

We performed a statistical analysis of a previously published set of gene expression microarray data from six different brain regions in two mouse strains. In the previous analysis, 24 genes showing expression differences between the strains and about 240 genes with regional differences in expression were identified. Like many gene expression studies, that analysis relied primarily on ad hoc 'fold change' and 'absent/present' criteria to select genes. To determine whether statistically motivated methods would give a more sensitive and selective analysis of gene expression patterns in the brain, we decided to use analysis of variance (ANOVA) and feature selection methods designed to select genes showing strain- or region-dependent patterns of expression.  相似文献   

2.
Heart failure (HF) is the major of cause of mortality and morbidity in the developed world. Gene expression profiles of animal model of heart failure have been used in number of studies to understand human cardiac disease. In this study, statistical methods of analysing microarray data on cardiac tissues from dogs with pacing induced HF were used to identify differentially expressed genes between normal and two abnormal tissues. The unsupervised techniques principal component analysis (PCA) and cluster analysis were explored to distinguish between three different groups of 12 arrays and to separate the genes which are up regulated in different conditions among 23912 genes in heart failure canines'' microarray data. It was found that out of 23912 genes, 1802 genes were differentially expressed in the three groups at 5% level of significance and 496 genes were differentially expressed at 1% level of significance using one way analysis of variance (ANOVA). The genes clustered using PCA and clustering analysis were explored in the paper to understand HF and a small number of differentially expressed genes related to HF were identified.  相似文献   

3.
Identifying differentially expressed (DE) genes across conditions or treatments is a typical problem in microarray experiments. In time course microarray experiments (under two or more conditions/treatments), it is sometimes of interest to identify two classes of DE genes: those with no time-condition interactions (called parallel DE genes, or PDE), and those with time-condition interactions (nonparallel DE genes, NPDE). Although many methods have been proposed for identifying DE genes in time course experiments, methods for discerning NPDE genes from the general DE genes are still lacking. We propose a functional ANOVA mixed-effect model to model time course gene expression observations. The fixed effect of (the mean curve) of the model decomposes bivariate functions of time and treatments (or experimental conditions) as in the classic ANOVA method and provides the associated notions of main effects and interactions. Random effects capture time-dependent correlation structures. In this model, identifying NPDE genes is equivalent to testing the significance of the time-condition interaction, for which an approximate F-test is suggested. We examined the performance of the proposed method on simulated datasets in comparison with some existing methods, and applied the method to a study of human reaction to the endotoxin stimulation, as well as to a cell cycle expression data set.  相似文献   

4.
Combining multiple microarrays in the presence of controlling variables   总被引:2,自引:0,他引:2  
MOTIVATION: Microarray technology enables the monitoring of expression levels for thousands of genes simultaneously. When the magnitude of the experiment increases, it becomes common to use the same type of microarrays from different laboratories or hospitals. Thus, it is important to analyze microarray data together to derive a combined conclusion after accounting for the differences. One of the main objectives of the microarray experiment is to identify differentially expressed genes among the different experimental groups. The analysis of variance (ANOVA) model has been commonly used to detect differentially expressed genes after accounting for the sources of variation commonly observed in the microarray experiment. RESULTS: We extended the usual ANOVA model to account for an additional variability resulting from many confounding variables such as the effect of different hospitals. The proposed model is a two-stage ANOVA model. The first stage is the adjustment for the effects of no interests. The second stage is the detection of differentially expressed genes among the experimental groups using the residuals obtained from the first stage. Based on these residuals, we propose a permutation test to detect the differentially expressed genes. The proposed model is illustrated using the data from 133 microarrays collected at three different hospitals. The proposed approach is more flexible to use, and it is easier to accommodate the individual covariates in this model than using the meta-analysis approach. AVAILABILITY: A set of programs written in R will be electronically sent upon request.  相似文献   

5.
Array comparative genomic hybridization (aCGH) provides a technique to survey the human genome for chromosomal aberrations in disease. The identification of genomic regions with aberrations may clarify the initiation and progression of cancer, improve diagnostic and prognostic accuracy, and guide therapy. The analysis of variance (ANOVA) model is widely used to detect differentially expressed genes after accounting for common sources of variation in microarray analysis. In this study, we propose a method, shifted ANOVA, to detect significantly altered regions. This method, based on the standard ANOVA, analyzes changes in copy number variation for regions. The selected regions have the group effect only, but no effect within samples and no interactive effects. The performance of the proposed method is evaluated from the homogeneity and classification accuracies of the selected regions. Shifted ANOVA may identify new candidate genes neighboring known because it detects significantly altered chromosomal regions, rather than independent probes.  相似文献   

6.
Twenty judges performed a variety of chemosensory tasks in order to select the best scores to form a panel for coffee evaluation. An average of correct responses (P%), one-way analysis of variance (ANOVA) and principal components analysis (PCA) were compared. The tests involved: ability to recognize the four basic tastes, identification and matching of odors, taste intensity evaluation and perception of small differences in taste. P% accounted for 71.17 ± 4.34% and 10 of the judges had scores greater than the final average. ANOVA and PCA resulted in 2 different panels consisting of 9 and 12 judges, respectively. The panel was composed by the nine panelists selected by the three methods. The other three panelists that were doubtful could improve to the point of acceptance with additional training. These methods should be used simultaneously to have more security in the acceptance or rejection of panelists.  相似文献   

7.
This article focuses on microarray experiments with two or more factors in which treatment combinations of the factors corresponding to the samples paired together onto arrays are not completely random. A main effect of one (or more) factor(s) is confounded with arrays (the experimental blocks). This is called a split-plot microarray experiment. We utilise an analysis of variance (ANOVA) model to assess differentially expressed genes for between-array and within-array comparisons that are generic under a split-plot microarray experiment. Instead of standard t- or F-test statistics that rely on mean square errors of the ANOVA model, we use a robust method, referred to as 'a pooled percentile estimator', to identify genes that are differentially expressed across different treatment conditions. We illustrate the design and analysis of split-plot microarray experiments based on a case application described by Jin et al. A brief discussion of power and sample size for split-plot microarray experiments is also presented.  相似文献   

8.
One of the essential issues in microarray data analysis is to identify differentially expressed genes (DEGs) under different experimental treatments. In this article, a statistical procedure was proposed to identify the DEGs for gene expression data with or without missing observations from microarray experiment with one- or two-treatment factors. An F statistic based on Henderson method III was constructed to test the significance of differential expression for each gene under different treatment(s) levels. The cutoff P value was adjusted to control the experimental-wise false discovery rate. A human acute leukemia dataset corrected from 38 leukemia patients was reanalyzed by the proposed method. In comparison to the results from significant analysis of microarray (SAM) and microarray analysis of variance (MAANOVA), it was indicated that the proposed method has similar performance with MAANOVA for data with one-treatment factor, but MAANOVA cannot directly handle missing data. In addition, a mouse brain dataset collected from six brain regions of two inbred strains (two-treatment factors) was reanalyzed to identify genes with distinct regional-specific expression patterns. The results showed that the proposed method could identify more distinct regional-specific expression patterns than the previous analysis of the same dataset. Moreover, a computer program was developed and incorporated in the software QTModel, which is freely available at .  相似文献   

9.
In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.  相似文献   

10.

Saliva is an easy to obtain bodily fluid that is specific to the oral environment. It can be used for metabolomic studies as it is representative of the overall wellbeing of an organism, as well as mouth health and bacterial flora. The metabolomic structure of saliva varies greatly depending on the bacteria present in the mouth as they produce a range of metabolites. In this study we have investigated the metabolomic profiles of human saliva that were obtained using 1H NMR (nuclear magnetic resonance) analysis. 48 samples of saliva were collected from 16 healthy subjects over 3 days. Each sample was split in two and the first half treated with an oral rinse, while the second was left untreated as a control sample. The 96 1H NMR metabolomic profiles obtained in the dataset are affected by three factors, namely 16 subjects, 3 sampling days and 2 treatments. These three factors contribute to the total variation in the dataset. When analysing datasets from saliva using traditional methods such as PCA (principal component analysis), the overall variance is dominated by subjects’ contributions, and we cannot see trends that would highlight the effect of specific factors such as oral rinse. In order to identify these trends, we used methods such as MSCA (multilevel simultaneous component analysis) and ASCA (ANOVA simultaneous component analysis), that provide variance splits according to the experimental factors, so that we could look at the particular effect of treatment on saliva. The analysis of the treatment effect was enhanced, as it was isolated from the overall variance and assessed without confounding factors.

  相似文献   

11.
A multivariate analysis derived from principal components analysis (PCA), and which allows the investigation on diet composition data, is introduced. To illustrate the method, prey composition data of stomach contents of brown trout Salmo trutta L. collected in a regulated stream were used. The diet composition, foraging strategies and related patterns of fish diet variation were analysed at a macrohabitat scale (i.e. riffles and glides) by way of biplots. These graphical presentations were consistent with PCA on proportions.  相似文献   

12.
We consider the problem of comparing the gene expression levels of cells grown under two different conditions using cDNA microarray data. We use a quality index, computed from duplicate spots on the same slide, to filter out outlying spots, poor quality genes and problematical slides. We also perform calibration experiments to show that normalization between fluorescent labels is needed and that the normalization is slide dependent and non-linear. A rank invariant method is suggested to select non-differentially expressed genes and to construct normalization curves in comparative experiments. After normalization the residuals from the calibration data are used to provide prior information on variance components in the analysis of comparative experiments. Based on a hierarchical model that incorporates several levels of variations, a method for assessing the significance of gene effects in comparative experiments is presented. The analysis is demonstrated via two groups of experiments with 125 and 4129 genes, respectively, in Escherichia coli grown in glucose and acetate.  相似文献   

13.
MIXED MODEL APPROACHES FOR ESTIMATING GENETIC VARIANCES AND COVARIANCES   总被引:62,自引:4,他引:58  
The limitations of methods for analysis of variance(ANOVA)in estimating genetic variances are discussed. Among the three methods(maximum likelihood ML, restricted maximum likelihood REML, and minimum norm quadratic unbiased estimation MINQUE)for mixed linear models, MINQUE method is presented with formulae for estimating variance components and covariances components and for predicting genetic effects. Several genetic models, which cannot be appropriately analyzed by ANOVA methods, are introduced in forms of mixed linear models. Genetic models with independent random effects can be analyzed by MINQUE(1)method whieh is a MINQUE method with all prior values setting 1. MINQUE(1)method can give unbiased estimation for variance components and covariance components, and linear unbiased prediction (LUP) for genetic effects. There are more complicate genetic models for plant seeds which involve correlated random effects. MINQUE(0/1)method, which is a MINQUE method with all prior covariances setting 0 and all prior variances setting 1, is suitable for estimating variance and covariance components in these models. Mixed model approaches have advantage over ANOVA methods for the capacity of analyzing unbalanced data and complicated models. Some problems about estimation and hypothesis test by MINQUE method are discussed.  相似文献   

14.
MOTIVATION: Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis. RESULTS: We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to 'denoise' a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.  相似文献   

15.
Experiments using cDNA microarrays for the identification of genes with certain expression patterns require a thoughtfully planned design. This study was conducted to determine an optimal design for a microarray experiment to estimate differential gene expression between hybrids and their parental inbred lines in maize (i.e. dominance). It has two features: the contrasts of interest contain more than two genotypes and the procedure may be customised to other microarray experiments where different effects may influence hybridisation signals. A mixed model was used to include all important effects. Impacts during growth of the plant material were taken into consideration as well as those occurring during hybridisation. The results of a preliminary experiment were used to determine which effects were to be included in the model, and data from another microarray experiment were used to estimate variance components. In order to select good designs, an optimality criterion adapted to the problem of differential gene expression between hybrids and their parental inbred lines was defined. Two approaches were used to determine an optimal design: the first one simplifies the problem by dividing it into several subproblems, whereas the second is more sophisticated and uses a simulated annealing (SA) algorithm. We found that the first approach constitutes a useful means for designing microarray experiments to study this problem. Using the more sophisticated SA approach the design can be further improved.  相似文献   

16.
Microarrays provide a valuable tool for the quantification of gene expression. Usually, however, there is a limited number of replicates leading to unsatisfying variance estimates in a gene‐wise mixed model analysis. As thousands of genes are available, it is desirable to combine information across genes. When more than two tissue types or treatments are to be compared it might be advisable to consider the array effect as random. Then information between arrays may be recovered, which can increase accuracy in estimation. We propose a method of variance component estimation across genes for a linear mixed model with two random effects. The method may be extended to models with more than two random effects. We assume that the variance components follow a log‐normal distribution. Assuming that the sums of squares from the gene‐wise analysis, given the true variance components, follow a scaled χ2‐distribution, we adopt an empirical Bayes approach. The variance components are estimated by the expectation of their posterior distribution. The new method is evaluated in a simulation study. Differentially expressed genes are more likely to be detected by tests based on these variance estimates than by tests based on gene‐wise variance estimates. This effect is most visible in studies with small array numbers. Analyzing a real data set on maize endosperm the method is shown to work well. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

17.
One of the most common and important goals of microarray studies is to identify genes that are differentially expressed between cells of different conditions. T-test and ANOVA models on the expression data are common practices to gauge the significance of the observed difference in expression levels. Transformation of the microarray data is often applied in order to satisfy the model assumptions being entertained. However, the distributional properties of the expression are gene specific, and it is impractical to find a single transformation that is universally optimal for all the genes. This difficulty results in the situation that some genes have to violate the assumptions of the model (e.g., homogeneity in variance, normality). It is thus the interest of this paper to evaluate the impact on the inference of differential expression when the test is performed under an inappropriate scale. Particularly, we quantitatively assess the loss of power when the test is performed under a wrong scale. Normal distribution and log-normal distribution of the expression data are considered. The loss in power is investigated in two scenarios: a transformation is misused, or a transformation fails to be applied. Log transformation and power transformation are particularly considered due to the fact that Box-Cox types of transformation are commonly used in practice. The impact of using a wrong scale is investigated analytically and based on simulations. The loss in power is assessed both as a function of the degree to which the assumptions are violated and as a function of the effect size. Simulations are conducted to quantitatively assess the power loss when tests are performed under a wrong scale. A public experimental microarray dataset is used to illustrate the impact of transformation on the results of testing differential expression. The results show that the loss of power is a function of CV and fold-change (effect size). The loss in power depends on the true model and on how severely the assumptions are violated.  相似文献   

18.
Yunsong Qi  Xibei Yang 《Genomics》2013,101(1):38-48
An important application of gene expression data is to classify samples in a variety of diagnostic fields. However, high dimensionality and a small number of noisy samples pose significant challenges to existing classification methods. Focused on the problems of overfitting and sensitivity to noise of the dataset in the classification of microarray data, we propose an interval-valued analysis method based on a rough set technique to select discriminative genes and to use these genes to classify tissue samples of microarray data. We first select a small subset of genes based on interval-valued rough set by considering the preference-ordered domains of the gene expression data, and then classify test samples into certain classes with a term of similar degree. Experiments show that the proposed method is able to reach high prediction accuracies with a small number of selected genes and its performance is robust to noise.  相似文献   

19.
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent “noise” within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号