首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
MOTIVATION: Detailed comparison and analysis of the output of DNA gene expression arrays from multiple samples require global normalization of the measured individual gene intensities from the different hybridizations. This is needed for accounting for variations in array preparation and sample hybridization conditions. RESULTS: Here, we present a simple, robust and accurate procedure for the global normalization of datasets generated with single-channel DNA arrays based on principal component analysis. The procedure makes minimal assumptions about the data and performs well in cases where other standard procedures produced biased estimates. It is also insensitive to data transformation, filtering (thresholding) and pre-screening.  相似文献   

2.
3.

Background  

Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult.  相似文献   

4.
MOTIVATION: Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance. RESULTS: We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account.  相似文献   

5.
6.
7.
8.
The R package mosclust (model order selection for clustering problems) implements algorithms based on the concept of stability for discovering significant structures in bio-molecular data. The software library provides stability indices obtained through different data perturbations methods (resampling, random projections, noise injection), as well as statistical tests to assess the significance of multi-level structures singled out from the data. Availability: http://homes.dsi.unimi.it/~valenti/SW/mosclust/download/mosclust_1.0.tar.gz. Supplementary information: http://homes.dsi.unimi.it/~valenti/SW/mosclust.  相似文献   

9.

Background  

A common observation in the analysis of gene expression data is that many genes display similarity in their expression patterns and therefore appear to be co-regulated. However, the variation associated with microarray data and the complexity of the experimental designs make the acquisition of co-expressed genes a challenge. We developed a novel method for Extracting microarray gene expression Patterns and Identifying co-expressed Genes, designated as EPIG. The approach utilizes the underlying structure of gene expression data to extract patterns and identify co-expressed genes that are responsive to experimental conditions.  相似文献   

10.
11.
Principal component analysis of compositional data   总被引:7,自引:0,他引:7  
AITCHISON  J. 《Biometrika》1983,70(1):57-65
  相似文献   

12.

Background

The Illumina HumanMethylation450 BeadChip (HM450K) measures the DNA methylation of 485,512 CpGs in the human genome. The technology relies on hybridization of genomic fragments to probes on the chip. However, certain genomic factors may compromise the ability to measure methylation using the array such as single nucleotide polymorphisms (SNPs), small insertions and deletions (INDELs), repetitive DNA, and regions with reduced genomic complexity. Currently, there is no clear method or pipeline for determining which of the probes on the HM450K bead array should be retained for subsequent analysis in light of these issues.

Results

We comprehensively assessed the effects of SNPs, INDELs, repeats and bisulfite induced reduced genomic complexity by comparing HM450K bead array results with whole genome bisulfite sequencing. We determined which CpG probes provided accurate or noisy signals. From this, we derived a set of high-quality probes that provide unadulterated measurements of DNA methylation.

Conclusions

Our method significantly reduces the risk of false discoveries when using the HM450K bead array, while maximising the power of the array to detect methylation status genome-wide. Additionally, we demonstrate the utility of our method through extraction of biologically relevant epigenetic changes in prostate cancer.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-51) contains supplementary material, which is available to authorized users.  相似文献   

13.
MOTIVATION: Class distinction is a supervised learning approach that has been successfully employed in the analysis of high-throughput gene expression data. Identification of a set of genes that predicts differential biological states allows for the development of basic and clinical scientific approaches to the diagnosis of disease. The Independent Consistent Expression Discriminator (ICED) was designed to provide a more biologically relevant search criterion during predictor selection by embracing the inherent variability of gene expression in any biological state. The four components of ICED include (i) normalization of raw data; (ii) assignment of weights to genes from both classes; (iii) counting of votes to determine optimal number of predictor genes for class distinction; (iv) calculation of prediction strengths for classification results. The search criteria employed by ICED is designed to identify not only genes that are consistently expressed at one level in one class and at a consistently different level in another class but identify genes that are variable in one class and consistent in another. The result is a novel approach to accurately select biologically relevant predictors of differential disease states from a small number of microarray samples. RESULTS: The data described herein utilized ICED to analyze the large AML/ALL training and test data set (Golub et al., 1999, Science, 286, 531-537) in addition to a smaller data set consisting of an animal model of the childhood neurodegenerative disorder, Batten disease, generated for this study. Both of the analyses presented herein have correctly predicted biologically relevant perturbations that can be used for disease classification, irrespective of sample size. Furthermore, the results have provided candidate proteins for future study in understanding the disease process and the identification of potential targets for therapeutic intervention.  相似文献   

14.

Background  

DNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data is gene clustering with respect to the expression patterns. Classifying genes into clusters can lead to interesting biological insights. In this study, we describe an iterative clustering approach to uncover biologically coherent structures from DNA microarray data based on a novel clustering algorithm EP_GOS_Clust.  相似文献   

15.
16.
The authors tested a new procedure for the discrimination of EPs obtained in different stimulus situations. In contrast with principal component analysis (PCA) used so far for the purpose of data compression, the method referred to as canonical component analysis (CCA) is optimal for the purpose of discrimination. To illustrate this, the authors performed both PCA and CCA for the same material, then after carrying out discriminant analysis (SDWA) for the data transformed in this way, compared the performance of the two procedures in discrimination. In view of both the theoretical and practical considerations, the authors recommend that in the future researchers use CCA instead of PCA in EP studies for data reduction carried out for discrimination.  相似文献   

17.
【目的】从基因组层次研究草菇低温自溶异常代谢的分子特征。【方法】对21个真菌物种进行全基因组系统发生分析,进而选取其中具有代表性的物种进行比较基因组学分析,系统研究草菇异常代谢分子特征。【结果】全基因组系统发生分析结果显示草菇位于草腐菌所形成簇的底端。基于全基因组系统发生树,由于担子菌和子囊菌属于完全不同的演化路径,所以选取担子菌中具有代表性的9个物种进行比较基因组学分析,结果显示相比于其它草腐菌,草菇基因家族具有一定的收缩趋势。进一步对不同范畴的基因家族数目进行比较,结果显示3个大于200的草菇基因家族(fam1、fam4和fam6)分别发生了显著扩增,且在总数上也显著高于其它物种,表明草菇的这一分子特征与其异常代谢相关。【结论】3个草菇基因家族(200)显著扩增提示其特定基因家族功能加强,很可能与草菇低温自溶密切相关。  相似文献   

18.
19.

Background  

Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号