首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
2.
Normalisation is an essential first step in the analysis of most cDNA microarray data, to correct for effects arising from imperfections in the technology. Loess smoothing is commonly used to correct for trends in log-ratio data. However, parametric models, such as the additive plus multiplicative variance model, have been preferred for scale normalisation, though the variance structure of microarray data may be of a more complex nature than can be accommodated by a parametric model. We propose a new nonparametric approach that incorporates location and scale normalisation simultaneously using a Generalised Additive Model for Location, Scale and Shape (GAMLSS, Rigby and Stasinopoulos, 2005, Applied Statistics, 54, 507-554). We compare its performance in inferring differential expression with Huber et al.'s (2002, Bioinformatics, 18, 96-104) arsinh variance stabilising transformation (AVST) using real and simulated data. We show GAMLSS to be as powerful as AVST when the parametric model is correct, and more powerful when the model is wrong.  相似文献   

3.
Shannon entropy is used to provide an estimate of the number of interpretable components in a principal component analysis. In addition, several ad hoc stopping rules for dimension determination are reviewed and a modification of the broken stick model is presented. The modification incorporates a test for the presence of an "effective degeneracy" among the subspaces spanned by the eigenvectors of the correlation matrix of the data set then allocates the total variance among subspaces. A summary of the performance of the methods applied to both published microarray data sets and to simulated data is given.  相似文献   

4.
cDNA芯片阳性对照的制备及在芯片敏感性分析中的应用   总被引:2,自引:0,他引:2  
cDNA芯片是一种高通量基因表达谱分析技术,在生理病理条件下细胞基因表达谱分析,新基因发现和功能研究等方面具有广阔应用前景。CDNA芯片阳性对照的选取以及CDNA芯片检测敏感性是芯片成功应用的关键问题之一。以在系统发育上与人类基因同源性小的荧火虫荧光素酶基因材料,制备了用于人类和其他动物基因表达谱CDNA芯片的通用型阳性对照探针和相应的mRNA参照物,经反转录对mRNA参照物进行Cy3荧光标记并与DNA芯片杂交后发现,mRNA参照物能特异性地与荧光酶基因cDNA片断杂交,而与人β-肌动蛋白基因,人G3PDH基因以及λDNA/HINDⅢ无杂交反应。把mRNA参照物以不同比例加入HepG2总RNA中,以反转录荧光标记后与CDNA芯片杂交,结果发现当总RNA中的MRNA含量为1/10^4稀释(即mRNA分子个数约为10^8个)时,CDNA芯片基本检测不出mRNA标记产物的杂交信号。而且,cDNA芯片检测的信号强度与芯片上固定的探针浓度密切相关,当探针浓度为2g/L时,杂交信号最强,随着探针浓度下降芯片的杂交信号趋于减弱。CDNA芯片通用型阳性参照物的制备以及应用于CDNA芯片检测敏感性研究为CDNA芯片应用于人和其他动物基因表达谱高通量分析和新基因功能研究提供了技术基础和理论依据。  相似文献   

5.
基因芯片数据在本质上是非线性的,因此用线性数据分析方法处理基因芯片数据将不可避免的会带来偏差。全面分析非线性降维方法(Isomap)的技术特点以及将其应用到基因芯片数据分析中所需要注意的事项具有一定的意义。  相似文献   

6.
MOTIVATION: Microarrays have been used to identify differential expression of individual genes or cluster genes that are coexpressed over various conditions. However, alteration in coexpression relationships has not been studied. Here we introduce a model for finding differential coexpression from microarrays and test its biological validity with respect to cancer. RESULTS: We collected 10 published gene expression datasets from cancers of 13 different tissues and constructed 2 distinct coexpression networks: a tumor network and normal network. Comparison of the two networks showed that cancer affected many coexpression relationships. Functional changes such as alteration in energy metabolism, promotion of cell growth and enhanced immune activity were accompanied with coexpression changes. Coregulation of collagen genes that may control invasion and metastatic spread of tumor cells was also found. Cluster analysis in the tumor network identified groups of highly interconnected genes related to ribosomal protein synthesis, the cell cycle and antigen presentation. Metallothionein expression was also found to be clustered, which may play a role in apoptosis control in tumor cells. Our results show that this model would serve as a novel method for analyzing microarrays beyond the specific implications for cancer.  相似文献   

7.
cDNA microarray technology and its applications   总被引:18,自引:0,他引:18  
The cDNA microarray is the most powerful tool for studying gene expression in many different organisms. It has been successfully applied to the simultaneous expression of many thousands of genes and to large-scale gene discovery, as well as polymorphism screening and mapping of genomic DNA clones. It is a high throughput, highly parallel RNA expression assay technique that permits quantitative analysis of RNAs transcribed from both known and unknown genes. This technique provides diagnostic fingerprints by comparing gene expression patterns in normal and pathological cells, and because it can simultaneously track expression levels of many genes, it provides a source of operational context for inference and predication about complex cell control systems. This review describes this recently developed cDNA microarray technology and its application to gene discovery and expression, and to diagnostics for certain diseases.  相似文献   

8.
Wang S  Zhu J 《Biometrics》2008,64(2):440-448
Summary .   Variable selection in high-dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model-based clustering. Unlike the classical L 1-norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural "group." Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L 1-norm approach.  相似文献   

9.
10.

Background  

In the microarray experiment, many undesirable systematic variations are commonly observed. Normalization is the process of removing such variation that affects the measured gene expression levels. Normalization plays an important role in the earlier stage of microarray data analysis. The subsequent analysis results are highly dependent on normalization. One major source of variation is the background intensities. Recently, some methods have been employed for correcting the background intensities. However, all these methods focus on defining signal intensities appropriately from foreground and background intensities in the image analysis. Although a number of normalization methods have been proposed, no systematic methods have been proposed using the background intensities in the normalization process.  相似文献   

11.
12.
Tracy  L  Bergemann 《遗传学报》2010,37(4):265-279
This research provides a new way to measure error in microarray data in order to improve gene expression analysis. Microarray data contains many sources of error. In order to glean information about mRNA expression levels, the true signal must first be segregated from noise. This research focuses on the variation that can be captured at the spot level in cDNA microarray images. Variation at other levels, due to differences at the array, dye, and block levels, can be corrected for by a variety of existing normalization procedures. Two signal quality estimates that capture the reliability of each spot printed on a microarray are described. A parametric estimate of within-spot variance, referred to here as σ2spot, assumes that pixels follow a normal distribution and are spatially correlated. A non-parametric estimate of error, called the mean square prediction error (MSPE), assumes that spots of high quality possess pixels that are similar to their neighbors. This paper will provide a framework to use either spot quality measure in downstream analysis, specifically as weights in regression models. Using these spot quality estimates as weights can result in greater efficiency, in a statistical sense, when modeling microarray data.  相似文献   

13.
MOTIVATION: Experimental limitations have resulted in the popularity of parametric statistical tests as a method for identifying differentially regulated genes in microarray data sets. However, these tests assume that the data follow a normal distribution. To date, the assumption that replicate expression values for any gene are normally distributed, has not been critically addressed for Affymetrix GeneChip data. RESULTS: The normality of the expression values calculated using four different commercial and academic software packages was investigated using a data set consisting of the same target RNA applied to 59 human Affymetrix U95A GeneChips using a combination of statistical tests and visualization techniques. For the majority of probe sets obtained from each analysis suite, the expression data showed a good correlation with normality. The exception was a large number of low-expressed genes in the data set produced using Affymetrix Microarray Suite 5.0, which showed a striking non-normal distribution. In summary, our data provide strong support for the application of parametric tests to GeneChip data sets without the need for data transformation.  相似文献   

14.
In this study we present two novel normalization schemes for cDNA microarrays. They are based on iterative local regression and optimization of model parameters by generalized cross-validation. Permutation tests assessing the efficiency of normalization demonstrated that the proposed schemes have an improved ability to remove systematic errors and to reduce variability in microarray data. The analysis also reveals that without parameter optimization local regression is frequently insufficient to remove systematic errors in microarray data.  相似文献   

15.
16.
The Microarray Explorer (MAExplorer) is a versatile Java-based data mining bioinformatic tool for analyzing quantitative cDNA expression profiles across multiple microarray platforms and DNA labeling systems. It may be run as either a stand-alone application or as a Web browser applet over the Internet. With this program it is possible to (i) analyze the expression of individual genes, (ii) analyze the expression of gene families and clusters, (iii) compare expression patterns and (iv) directly access other genomic databases for clones of interest. Data may be downloaded as required from a Web server or in the case of the stand-alone version, reside on the user’s computer. Analyses are performed in real-time and may be viewed and directly manipulated in images, reports, scatter plots, histograms, expression profile plots and cluster analyses plots. A key feature is the clone data filter for constraining a working set of clones to those passing a variety of user-specified logical and statistical tests. Reports may be generated with hypertext Web access to UniGene, GenBank and other Internet databases for sets of clones found to be of interest. Users may save their explorations on the Web server or local computer and later recall or share them with other scientists in this groupware Web environment. The emphasis on direct manipulation of clones and sets of clones in graphics and tables provides a high level of interaction with the data, making it easier for investigators to test ideas when looking for patterns. We have used the MAExplorer to profile gene expression patterns of 1500 duplicated genes isolated from mouse mammary tissue. We have identified genes that are preferentially expressed during pregnancy and during lactation. One gene we identified, carbonic anhydrase III, is highly expressed in mammary tissue from virgin and pregnant mice and in gene knock-out mice with underdeveloped mammary epithelium. Other genes, which include those encoding milk proteins, are preferentially expressed during lactation. MAExplorer may be accessed at http://www.lecb.ncifcrf.gov.MAExplorer.  相似文献   

17.
This research provides a new way to measure error in microarray data in order to improve gene expression analysis.Microarray data contains many sources of error.In order to glean information about mRNA expression levels,the true signal must first be segregated from noise.This research focuses on the variation that can be captured at the spot level in cDNA microarray images.Variation at other levels,due to differences at the array,dye,and block levels,can be corrected for by a variety of existing normalization procedures.Two signal quality estimates that capture the reliability of each spot printed on a microarray are described.A parametric estimate of within-spot vari ance,referred to here as σ s2pot,assumes that pixels follow a normal distribution and are spatially correlated.A non-parametric estimate of error,called the mean square prediction error(MSPE),assumes that spots of high quality possess pixels that are similar to their neighbors.This paper will provide a framework to use either spot quality measure in downstream analysis,specifically as weights in regression models.Using these spot quality estimates as weights can result in greater efficiency,in a statistical sense,when modeling microarray data.  相似文献   

18.
Both cDNA microarray and spectroscopic data provide indirect information about the chemical compounds present in the biological tissue under consideration. In this paper simple univariate and bivariate measures are used to investigate correlations between both types of high dimensional analyses. A large dataset of 42 hemp samples on which 3456 cDNA clones and 351 NIR wavelengths have been measured, was analyzed using graphical representations. For this purpose we propose clustered correlation and clustered discrimination images. Large, tissue-related differences are seen to dominate the cDNA-NIR correlation structure but smaller, more difficult to detect, variety-related differences can be found at specific cDNA clone/NIR wavelength combinations.  相似文献   

19.
This research provides a new way to measure error in microarray data in order to improve gene expression analysis.Microarray data contains many sources of error.In order to glean information about mRNA expression levels,the true signal must first be segregated from noise.This research focuses on the variation that can be captured at the spot level in cDNA microarray images.Variation at other levels,due to differences at the array,dye,and block levels,can be corrected for by a variety of existing normalizati...  相似文献   

20.
In accordance with general principles recommended by the International Committee for Standardization in Haematology (1982, Journal of Clinical Pathology 35, 1320-1322), we have developed statistical methods for the analysis of red cell volume distributions. To select an appropriate reference distribution for goodness-of-fit testing, we derived a mathematical model of erythropoiesis that predicted a lognormal form for the distribution of erythrocyte volumes. Model predictions were then tested using samples obtained from 50 healthy individuals. Each grouped red cell volume distribution was doubly-truncated to eliminate artifactual frequency counts. Distribution parameter estimates were computed using the expectation-maximization algorithm, a missing information technique. Results of the one-sample chi-square goodness-of-fit test showed a fairly even distribution of P-values over the interval. Examples of the application of these statistical procedures to distributions from patients with anemia are given. Our results suggest that, for the analysis of red blood cell volumes, (i) parameter estimation should be made with the expectation-maximization method, and (ii) the truncated lognormal distribution should be used as a reference distribution for goodness-of-fit testing. This method could be applied to any set of grouped doubly-truncated data which, after transformation, follows the normal model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号