首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.  相似文献   

2.
This paper will give a complete methodological approach to the processing of oligonucleotide microarray data from postmortem tissue, particularly brain matter. Attention will be drawn to each of the important stages in the process; specifically the quality control, gene expression value calculation, multiple hypothesis testing and correlation analyses. We shall initially discuss the theoretical foundations of each individual method and subsequently apply the ensemble to a sample data set to illustrate and visualise important points.  相似文献   

3.
基因芯片筛选差异表达基因方法比较   总被引:1,自引:0,他引:1  
单文娟  童春发  施季森 《遗传》2008,30(12):1640-1646
摘要: 使用计算机模拟数据和真实的芯片数据, 对8种筛选差异表达基因的方法进行了比较分析, 旨在比较不同方法对基因芯片数据的筛选效果。模拟数据分析表明, 所使用的8种方法对均匀分布的差异表达基因有很好的识别、检出作用。算法方面, SAM和Wilcoxon秩和检验方法较好; 数据分布方面, 正态分布的识别效果较好, 卡方分布和指数分布的识别效果较差。杨树cDNA芯片分析表明, SAM、Samroc和回归模型方法相近, 而Wilcoxon秩和检验方法与它们有较大差异。  相似文献   

4.
MOTIVATION: Microarray experiments generate a high data volume. However, often due to financial or experimental considerations, e.g. lack of sample, there is little or no replication of the experiments or hybridizations. These factors combined with the intrinsic variability associated with the measurement of gene expression can result in an unsatisfactory detection rate of differential gene expression (DGE). Our motivation was to provide an easy to use measure of the success rate of DGE detection that could find routine use in the design of microarray experiments or in post-experiment assessment. RESULTS: In this study, we address the problem of both random errors and systematic biases in microarray experimentation. We propose a mathematical model for the measured data in microarray experiments and on the basis of this model present a t-based statistical procedure to determine DGE. We have derived a formula to determine the success rate of DGE detection that takes into account the number of microarrays, the number of genes, the magnitude of DGE, and the variance from biological and technical sources. The formula and look-up tables based on the formula, can be used to assist in the design of microarray experiments. We also propose an ad hoc method for estimating the fraction of non-differentially expressed genes within a set of genes being tested. This will help to increase the power of DGE detection. AVAILABILITY: The functions to calculate the success rate of DGE detection have been implemented as a Java application, which is accessible at http://www.le.ac.uk/mrctox/microarray_lab/Microarray_Softwares/Microarray_Softwares.htm  相似文献   

5.
6.
A statistical framework for genomic data fusion   总被引:8,自引:0,他引:8  
MOTIVATION: During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data. RESULTS: This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Each dataset is represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and can be applied to many different types of data. Furthermore, kernel functions derived from different types of data can be combined in a straightforward fashion. Recent advances in the theory of kernel methods have provided efficient algorithms to perform such combinations in a way that minimizes a statistical loss function. These methods exploit semidefinite programming techniques to reduce the problem of finding optimizing kernel combinations to a convex optimization problem. Computational experiments performed using yeast genome-wide datasets, including amino acid sequences, hydropathy profiles, gene expression data and known protein-protein interactions, demonstrate the utility of this approach. A statistical learning algorithm trained from all of these data to recognize particular classes of proteins--membrane proteins and ribosomal proteins--performs significantly better than the same algorithm trained on any single type of data. AVAILABILITY: Supplementary data at http://noble.gs.washington.edu/proj/sdp-svm  相似文献   

7.
MOTIVATION: DNA microarrays are now capable of providing genome-wide patterns of gene expression across many different conditions. The first level of analysis of these patterns requires determining whether observed differences in expression are significant or not. Current methods are unsatisfactory due to the lack of a systematic framework that can accommodate noise, variability, and low replication often typical of microarray data. RESULTS: We develop a Bayesian probabilistic framework for microarray data analysis. At the simplest level, we model log-expression values by independent normal distributions, parameterized by corresponding means and variances with hierarchical prior distributions. We derive point estimates for both parameters and hyperparameters, and regularized expressions for the variance of each gene by combining the empirical variance with a local background variance associated with neighboring genes. An additional hyperparameter, inversely related to the number of empirical observations, determines the strength of the background variance. Simulations show that these point estimates, combined with a t -test, provide a systematic inference approach that compares favorably with simple t -test or fold methods, and partly compensate for the lack of replication.  相似文献   

8.
We propose a general theoretical framework for analyzing differentially expressed genes and behavior patterns from two homogenous short time-course data. The framework generalizes the recently proposed Hilbert-Schmidt Independence Criterion (HSIC)-based framework adapting it to the time-series scenario by utilizing tensor analysis for data transformation. The proposed framework is effective in yielding criteria that can identify both the differentially expressed genes and time-course patterns of interest between two time-series experiments without requiring to explicitly cluster the data. The results, obtained by applying the proposed framework with a linear kernel formulation, on various data sets are found to be both biologically meaningful and consistent with published studies.  相似文献   

9.
Pok G  Liu JC  Ryu KH 《Bioinformation》2010,4(8):385-389
The microarray technique has become a standard means in simultaneously examining expression of all genes measured in different circumstances. As microarray data are typically characterized by high dimensional features with a small number of samples, feature selection needs to be incorporated to identify a subset of genes that are meaningful for biological interpretation and accountable for the sample variation. In this article, we present a simple, yet effective feature selection framework suitable for two-dimensional microarray data. Our correlation-based, nonparametric approach allows compact representation of class-specific properties with a small number of genes. We evaluated our method using publicly available experimental data and obtained favorable results.  相似文献   

10.
Lam Tran  Kevin He  Di Wang  Hui Jiang 《Biometrics》2023,79(2):1280-1292
The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.  相似文献   

11.
12.

Background  

Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods.  相似文献   

13.

Background  

The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the null hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression.  相似文献   

14.
Modular variation of multivariate traits results from modular distribution of effects of genetic and epigenetic interactions among those traits. However, statistical methods rarely detect truly modular patterns, possibly because the processes that generate intramodular associations may overlap spatially. Methodologically, this overlap may cause multiple patterns of modularity to be equally consistent with observed covariances. To deal with this indeterminacy, the present study outlines a framework for testing a priori hypotheses of modularity in which putative modules are mathematically represented as multidimensional subspaces embedded in the data. Model expectations are computed by subdividing the data into arrays of variables, and intermodular interactions are represented by overlapping arrays. Covariance structures are thus modeled as the outcome of complex and nonorthogonal intermodular interactions. This approach is demonstrated by analyzing mandibular modularity in nine rodent species. A total of 620 models are fit to each species, and the most strongly supported are heuristically modified to improve their fit. Five modules common to all species are identified, which approximately map to the developmental modules of the mandible. Within species, these modules are embedded within larger "super-modules," suggesting that these conserved modules act as building blocks from which covariation patterns are built.  相似文献   

15.
刘万霖  李栋  朱云平  贺福初 《遗传》2007,29(12):1434-1442
随着微阵列数据的快速增长, 微阵列基因表达数据日益成为生物信息学研究的重要数据源。利用微阵列基因表达数据构建基因调控网络也成为一个研究热点。通过构建基因调控网络, 可以解读复杂的调控关系, 发现细胞内的调控模式, 并进而在系统尺度上理解生物学进程。近年来, 人们引入了多种算法来利用基因芯片数据构建基因调控网络。文章回顾了这些算法的发展历史, 尤其是其在理论和方法上的改进, 给出了一些相关的软件平台, 并预测了该领域可能的发展趋势。  相似文献   

16.
Sun W 《Biometrics》2012,68(1):1-11
RNA-seq may replace gene expression microarrays in the near future. Using RNA-seq, the expression of a gene can be estimated using the total number of sequence reads mapped to that gene, known as the total read count (TReC). Traditional expression quantitative trait locus (eQTL) mapping methods, such as linear regression, can be applied to TReC measurements after they are properly normalized. In this article, we show that eQTL mapping, by directly modeling TReC using discrete distributions, has higher statistical power than the two-step approach: data normalization followed by linear regression. In addition, RNA-seq provides information on allele-specific expression (ASE) that is not available from microarrays. By combining the information from TReC and ASE, we can computationally distinguish cis- and trans-eQTL and further improve the power of cis-eQTL mapping. Both simulation and real data studies confirm the improved power of our new methods. We also discuss the design issues of RNA-seq experiments. Specifically, we show that by combining TReC and ASE measurements, it is possible to minimize cost and retain the statistical power of cis-eQTL mapping by reducing sample size while increasing the number of sequence reads per sample. In addition to RNA-seq data, our method can also be employed to study the genetic basis of other types of sequencing data, such as chromatin immunoprecipitation followed by DNA sequencing data. In this article, we focus on eQTL mapping of a single gene using the association-based method. However, our method establishes a statistical framework for future developments of eQTL mapping methods using RNA-seq data (e.g., linkage-based eQTL mapping), and the joint study of multiple genetic markers and/or multiple genes.  相似文献   

17.

Background  

Typical analysis of microarray data ignores the correlation between gene expression values. In this paper we present a model for microarray data which specifically allows for correlation between genes. As a result we combine gene network ideas with linear models and differential expression.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号