Effective dimensionality for principal component analysis of time series expression data |
| |
Authors: | Hörnquist Michael Hertz John Wahde Mattias |
| |
Affiliation: | Department of Science and Technology, Link?ping University, SE-601 74, Norrk?ping, Sweden. micho@itn.liu.se |
| |
Abstract: | Large-scale expression data are today measured for thousands of genes simultaneously. This development has been followed by an exploration of theoretical tools to get as much information out of these data as possible. Several groups have used principal component analysis (PCA) for this task. However, since this approach is data-driven, care must be taken in order not to analyze the noise instead of the data. As a strong warning towards uncritical use of the output from a PCA, we employ a newly developed procedure to judge the effective dimensionality of a specific data set. Although this data set is obtained during the development of rat central nervous system, our finding is a general property of noisy time series data. Based on knowledge of the noise-level for the data, we find that the effective number of dimensions that are meaningful to use in a PCA is much lower than what could be expected from the number of measurements. We attribute this fact both to effects of noise and the lack of independence of the expression levels. Finally, we explore the possibility to increase the dimensionality by performing more measurements within one time series, and conclude that this is not a fruitful approach. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|