首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Little consideration has been given to the effect of different segmentation methods on the variability of data derived from microarray images. Previous work has suggested that the significant source of variability from microarray image analysis is from estimation of local background. In this study, we used Analysis of Variance (ANOVA) models to investigate the effect of methods of segmentation on the precision of measurements obtained from replicate microarray experiments. We used four different methods of spot segmentation (adaptive, fixed circle, histogram and GenePix) to analyse a total number of 156 172 spots from 12 microarray experiments. Using a two-way ANOVA model and the coefficient of repeatability, we show that the method of segmentation significantly affects the precision of the microarray data. The histogram method gave the lowest variability across replicate spots compared to other methods, and had the lowest pixel-to-pixel variability within spots. This effect on precision was independent of background subtraction. We show that these findings have direct, practical implications as the variability in precision between the four methods resulted in different numbers of genes being identified as differentially expressed. Segmentation method is an important source of variability in microarray data that directly affects precision and the identification of differentially expressed genes.  相似文献   

2.
Microarray experiments are being increasingly used in molecular biology. A common task is to detect genes with differential expression across two experimental conditions, such as two different tissues or the same tissue at two time points of biological development. To take proper account of statistical variability, some statistical approaches based on the t-statistic have been proposed. In constructing the t-statistic, one needs to estimate the variance of gene expression levels. With a small number of replicated array experiments, the variance estimation can be challenging. For instance, although the sample variance is unbiased, it may have large variability, leading to a large mean squared error. For duplicated array experiments, a new approach based on simple averaging has recently been proposed in the literature. Here we consider two more general approaches based on nonparametric smoothing. Our goal is to assess the performance of each method empirically. The three methods are applied to a colon cancer data set containing 2,000 genes. Using two arrays, we compare the variance estimates obtained from the three methods. We also consider their impact on the t-statistics. Our results indicate that the three methods give variance estimates close to each other. Due to its simplicity and generality, we recommend the use of the smoothed sample variance for data with a small number of replicates. Electronic Publication  相似文献   

3.
Localization of causal variants underlying known risk loci is one of the main research challenges following genome-wide association studies. Risk loci are typically dissected through fine-mapping experiments in trans-ethnic cohorts for leveraging the variability in the local genetic structure across populations. More recent works have shown that genomic functional annotations (i.e., localization of tissue-specific regulatory marks) can be integrated for increasing fine-mapping performance within single-population studies. Here, we introduce methods that integrate the strength of association between genotype and phenotype, the variability in the genetic backgrounds across populations, and the genomic map of tissue-specific functional elements to increase trans-ethnic fine-mapping accuracy. Through extensive simulations and empirical data, we have demonstrated that our approach increases fine-mapping resolution over existing methods. We analyzed empirical data from a large-scale trans-ethnic rheumatoid arthritis (RA) study and showed that the functional genetic architecture of RA is consistent across European and Asian ancestries. In these data, we used our proposed methods to reduce the average size of the 90% credible set from 29 variants per locus for standard non-integrative approaches to 22 variants.  相似文献   

4.
5.
We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation.  相似文献   

6.
7.

Background

With the growing abundance of microarray data, statistical methods are increasingly needed to integrate results across studies. Two common approaches for meta-analysis of microarrays include either combining gene expression measures across studies or combining summaries such as p-values, probabilities or ranks. Here, we compare two Bayesian meta-analysis models that are analogous to these methods.

Results

Two Bayesian meta-analysis models for microarray data have recently been introduced. The first model combines standardized gene expression measures across studies into an overall mean, accounting for inter-study variability, while the second combines probabilities of differential expression without combining expression values. Both models produce the gene-specific posterior probability of differential expression, which is the basis for inference. Since the standardized expression integration model includes inter-study variability, it may improve accuracy of results versus the probability integration model. However, due to the small number of studies typical in microarray meta-analyses, the variability between studies is challenging to estimate. The probability integration model eliminates the need to model variability between studies, and thus its implementation is more straightforward. We found in simulations of two and five studies that combining probabilities outperformed combining standardized gene expression measures for three comparison values: the percent of true discovered genes in meta-analysis versus individual studies; the percent of true genes omitted in meta-analysis versus separate studies, and the number of true discovered genes for fixed levels of Bayesian false discovery. We identified similar results when pooling two independent studies of Bacillus subtilis. We assumed that each study was produced from the same microarray platform with only two conditions: a treatment and control, and that the data sets were pre-scaled.

Conclusion

The Bayesian meta-analysis model that combines probabilities across studies does not aggregate gene expression measures, thus an inter-study variability parameter is not included in the model. This results in a simpler modeling approach than aggregating expression measures, which accounts for variability across studies. The probability integration model identified more true discovered genes and fewer true omitted genes than combining expression measures, for our data sets.  相似文献   

8.
Differing arresting agents and protocols can be used to synchronize cells in cultures to specific phases of the cell when studying cell-cycle gene expressions. Often, data derived from individual experiments are analyzed separately, since no appropriate statistical methodology is available at the moment to analyze the data from all such experiments simultaneously. The focus of this paper is to determine the association and coherence of the relative activation times of cell-cycling genes under different experimental conditions. Using a circular-circular regression model, we define two parameters, a rotation parameter for the angular difference between cells' arresting times (phases) in two cell-cycle experiments, and an association parameter to describe the correspondence between the cycle times of maximal expression (phase angles) for a set of genes studied in two experiments. Further, we propose a procedure to assess coherence across multiple experiments, i.e. to what extent the circular ordering of the phase angles of genes is maintained across multiple experiments. Coherence of genes across experiments suggests that functionally these genes tend to respond in a stereotypically sequenced way under different experimental conditions. Our proposed methodology is illustrated by applying it to a HeLa cell-cycle gene-expression data.  相似文献   

9.
We sketch the outlines of a theory of variability discrimination that aggregates localized differences to mediate variability discrimination. This Finding Differences Model was compared to a Positional Entropy Model across four different data sets. Although the two models provide strong and similar fits across three of the data sets, only the Finding Differences Model is applicable to investigations involving multidimensional variability. Furthermore, the Finding Differences Model is based on an activation map that has been shown to have utility for visual search tasks, thus establishing its generality across task domains.  相似文献   

10.
Phenotypic characterization of individual cells provides crucial insights into intercellular heterogeneity and enables access to information that is unavailable from ensemble averaged, bulk cell analyses. Single-cell studies have attracted significant interest in recent years and spurred the development of a variety of commercially available and research-grade technologies. To quantify cell-to-cell variability of cell populations, we have developed an experimental platform for real-time measurements of oxygen consumption (OC) kinetics at the single-cell level. Unique challenges inherent to these single-cell measurements arise, and no existing data analysis methodology is available to address them. Here we present a data processing and analysis method that addresses challenges encountered with this unique type of data in order to extract biologically relevant information. We applied the method to analyze OC profiles obtained with single cells of two different cell lines derived from metaplastic and dysplastic human Barrett's esophageal epithelium. In terms of method development, three main challenges were considered for this heterogeneous dynamic system: (i) high levels of noise, (ii) the lack of a priori knowledge of single-cell dynamics, and (iii) the role of intercellular variability within and across cell types. Several strategies and solutions to address each of these three challenges are presented. The features such as slopes, intercepts, breakpoint or change-point were extracted for every OC profile and compared across individual cells and cell types. The results demonstrated that the extracted features facilitated exposition of subtle differences between individual cells and their responses to cell-cell interactions. With minor modifications, this method can be used to process and analyze data from other acquisition and experimental modalities at the single-cell level, providing a valuable statistical framework for single-cell analysis.  相似文献   

11.
A key step in the analysis of microarray data is the selection of genes that are differentially expressed. Ideally, such experiments should be properly replicated in order to infer both technical and biological variability, and the data should be subjected to rigorous hypothesis tests to identify the differentially expressed genes. However, in microarray experiments involving the analysis of very large numbers of biological samples, replication is not always practical. Therefore, there is a need for a method to select differentially expressed genes in a rational way from insufficiently replicated data. In this paper, we describe a simple method that uses bootstrapping to generate an error model from a replicated pilot study that can be used to identify differentially expressed genes in subsequent large-scale studies on the same platform, but in which there may be no replicated arrays. The method builds a stratified error model that includes array-to-array variability, feature-to-feature variability and the dependence of error on signal intensity. We apply this model to the characterization of the host response in a model of bacterial infection of human intestinal epithelial cells. We demonstrate the effectiveness of error model based microarray experiments and propose this as a general strategy for a microarray-based screening of large collections of biological samples.  相似文献   

12.
Patterns of variability in quantitative traits across environmental gradients have received relatively little attention in evolutionary ecology. A recent meta-analysis showed that relative phenotypic variability in body size tends to decrease with improving environmental conditions. This pattern was explained by introducing the concept of upper threshold size to a general optimality model of individual growth but alternative explanations certainly exist. In particular, it is frequently observed in insects that variability in individual growth rates decreases with improving environmental conditions. Here we explore the effect of this phenomenon on environment-specific variability in adult sizes. A quantitative model shows that relative variability in adult sizes is independent of environmental quality if absolute variability in growth rates remains constant across the gradient of environmental quality. Deviations from this borderline case are definitely realistic in both directions. Both negative and positive relationships between relative variability of body size and environmental quality can thus be predicted to arise as a consequence of environment-specific variability in growth rates. The variability itself can be both genetic or environmental in its nature. We present empirical data which support both the assumptions and conclusions of our model-based analysis, as well as emphasize the advantages of controlled experiments for understanding the proximate sources of phenotypic variance.  相似文献   

13.
Suppose that independent experiments each indicate general qualitative results, such as higher than normal incidence rates of tumors for exposed populations. This paper suggests methods for amalgamating the qualitative results from several such experiments into a more quantitative form, such as a dose-response relationship. The methods are designed to be robust both to systematic bias in one of the experiments and also to procedural variability across experiments. Data from four rodent experiments with tolazamide are used to illustrate the methods.  相似文献   

14.
Variation in enzymatic transient gene expression assays   总被引:8,自引:0,他引:8  
We examined causes for high variability in data from enzymatic transient gene expression assays. Our results strongly suggest that variation in transfection efficiency is the major cause of data variation and can seriously compromise valid interpretation of data. We compared averaging data from multiple transfections and cotransfection of a second reporter gene as methods for correcting for variation in transfection efficiency. We found that transfection efficiency can be so highly variable that neither method necessarily overcomes the resulting bias in data. Depending upon the degree in variation in transfection efficiency, a combination of the two methods may be advisable. The need to normalize data for transfection efficiency is dependent upon the difference in strengths of promoters being tested and the relative variability of the transfection method used. We also show that the level of reporter gene expression between transfection experiments performed on different days can vary by more than 10-fold.  相似文献   

15.
MJ Michel  JH Knouft 《PloS one》2012,7(9):e44932
When species distribution models (SDMs) are used to predict how a species will respond to environmental change, an important assumption is that the environmental niche of the species is conserved over evolutionary time-scales. Empirical studies conducted at ecological time-scales, however, demonstrate that the niche of some species can vary in response to environmental change. We use habitat and locality data of five species of stream fishes collected across seasons to examine the effects of niche variability on the accuracy of projections from Maxent, a popular SDM. We then compare these predictions to those from an alternate method of creating SDM projections in which a transformation of the environmental data to similar scales is applied. The niche of each species varied to some degree in response to seasonal variation in environmental variables, with most species shifting habitat use in response to changes in canopy cover or flow rate. SDMs constructed from the original environmental data accurately predicted the occurrences of one species across all seasons and a subset of seasons for two other species. A similar result was found for SDMs constructed from the transformed environmental data. However, the transformed SDMs produced better models in ten of the 14 total SDMs, as judged by ratios of mean probability values at known presences to mean probability values at all other locations. Niche variability should be an important consideration when using SDMs to predict future distributions of species because of its prevalence among natural populations. The framework we present here may potentially improve these predictions by accounting for such variability.  相似文献   

16.
We outline and describe steps for a statistically rigorous approach to analyzing probe-level Affymetrix GeneChip data. The approach employs classical linear mixed models and operates on a gene-by-gene basis. Forgoing any attempts at gene presence or absence calls, the method simultaneously considers the data across all chips in an experiment. Primary output includes precise estimates of fold change (some as low as 1.1), their statistical significance, and measures of array and probe variability. The method can accommodate complex experiments involving many kinds of treatments and can test for their effects at the probe level. Furthermore, mismatch probe data can be incorporated in different ways or ignored altogether. Data from an ionizing radiation experiment on human cell lines illustrate the key concepts.  相似文献   

17.
Agricultural production systems face increasing threats from more frequent and extreme weather fluctuations associated with global climate change. While there is mounting evidence that increased plant community diversity can reduce the variability of ecosystem functions (such as primary productivity) in the face of environmental fluctuation, there has been little work testing whether this is true for intensively managed agricultural systems. Using statistical modeling techniques to fit environment–productivity relationships offers an efficient means of leveraging hard‐won experimental data to compare the potential variability of different mixtures across a wide range of environmental contexts. We used data from two multiyear field experiments to fit climate–soil–productivity models for two pasture mixtures under intensive grazing—one composed of two drought‐sensitive species (standard), and an eight‐species mixture including several drought‐resistant species (complex). We then used these models to undertake a scoping study estimating the mean and coefficient of variation (CV) of annual productivity for long‐term climate data covering all New Zealand on soils with low, medium, or high water‐holding capacity. Our results suggest that the complex mixture is likely to have consistently lower CV in productivity, irrespective of soil type or climate regime. Predicted differences in mean annual productivity between mixtures were strongly influenced by soil type and were closely linked to mean annual soil water availability across all soil types. Differences in the CV of productivity were only strongly related to interannual variance in water availability for the lowest water‐holding capacity soil. Our results show that there is considerable scope for mixtures including drought‐tolerant species to enhance certainty in intensive pastoral systems. This provides justification for investing resources in a large‐scale distributed experiment involving many sites under different environmental contexts to confirm these findings.  相似文献   

18.
《Animal behaviour》1987,35(5):1366-1375
Some elementary points concerning the statistical consequences of individual differences in behaviour are discussed. Various ways are suggested of taking individual differences into account when designing experiments. True individual differences are represented by the within-group variability that remains after the effects of measurement error have been excluded. Thus, mere variability in data does not necessarily demonstrate that true individual differences are present. Individual differences have two important effects. First, statistical power is reduced, which means that true effects are more difficult to detect. Second, statements about groups may be untrue for all individuals in the group and, conversely, group characteristics may be difficult to infer from measurements of individuals. Some simple tactics for coping with individual differences in experimental data are outlined: (1) obtaining repeated outcome scores for each subject; (2) obtaining a baseline score for each subject prior to the experimental treatment; (3) a combination of 1 and 2; and (4) using a longitudinal design, i.e. obtaining a series of scores across time for each subject. Each of these tactics also has the merit of providing additional information about the nature of the response to the treatment. A fifth tactic is to increase the sample size. Finally, some possible disadvantages of a sixth tactic, that of matching experimental and control subjects, are pointed out.  相似文献   

19.
Tables of means, over assessors, are often used to summarize the results of sensory profile experiments. These tables are sometimes further summarized by Principal Components Analysis (PCA) to give plots of the samples in the principal sensory dimensions. An alternative procedure is to use Generalized Procrutes Analysis (GPA) on the assessor data to allow for differences in usage of the vocabulary and in the proportion of the scale used. It is shown that these methods give different configurations in the principal sensory dimensions when applied to the data from a study of cheeses (Muir et al. 1995). Using a Jackknife method to calculate the variability of the samples in the principal sensory dimensions, the results from the GPA method are shown to have a higher dimensionality than from the PCA method. Jackknife estimates of variability are used to calculate confidence ellipses to attach to the sensory space maps.  相似文献   

20.
Previous research suggests that visual attention can be allocated to locations in space (space-based attention) and to objects (object-based attention). The cueing effects associated with space-based attention tend to be large and are found consistently across experiments. Object-based attention effects, however, are small and found less consistently across experiments. In three experiments we address the possibility that variability in object-based attention effects across studies reflects low incidence of such effects at the level of individual subjects. Experiment 1 measured space-based and object-based cueing effects for horizontal and vertical rectangles in 60 subjects comparing commonly used target detection and discrimination tasks. In Experiment 2 we ran another 120 subjects in a target discrimination task in which rectangle orientation varied between subjects. Using parametric statistical methods, we found object-based effects only for horizontal rectangles. Bootstrapping methods were used to measure effects in individual subjects. Significant space-based cueing effects were found in nearly all subjects in both experiments, across tasks and rectangle orientations. However, only a small number of subjects exhibited significant object-based cueing effects. Experiment 3 measured only object-based attention effects using another common paradigm and again, using bootstrapping, we found only a small number of subjects that exhibited significant object-based cueing effects. Our results show that object-based effects are more prevalent for horizontal rectangles, which is in accordance with the theory that attention may be allocated more easily along the horizontal meridian. The fact that so few individuals exhibit a significant object-based cueing effect presumably is why previous studies of this effect might have yielded inconsistent results. The results from the current study highlight the importance of considering individual subject data in addition to commonly used statistical methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号