期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Component retention in principal component analysis with application to cDNA microarray data

Richard Cangelosi Alain Goriely 《Biology direct》2007,2(1):2-21

Shannon entropy is used to provide an estimate of the number of interpretable components in a principal component analysis. In addition, several ad hoc stopping rules for dimension determination are reviewed and a modification of the broken stick model is presented. The modification incorporates a test for the presence of an "effective degeneracy" among the subspaces spanned by the eigenvectors of the correlation matrix of the data set then allocates the total variance among subspaces. A summary of the performance of the methods applied to both published microarray data sets and to simulated data is given. 相似文献

2.

Effective dimensionality of large-scale expression data using principal component analysis

Hörnquist M Hertz J Wahde M 《Bio Systems》2002,65(2-3):147-156

Large-scale expression data are today measured for thousands of genes simultaneously. This development is followed by an exploration of theoretical tools to get as much information out of these data as possible. One line is to try to extract the underlying regulatory network. The models used thus far, however, contain many parameters, and a careful investigation is necessary in order not to over-fit the models. We employ principal component analysis to show how, in the context of linear additive models, one can get a rough estimate of the effective dimensionality (the number of information-carrying dimensions) of large-scale gene expression datasets. We treat both the lack of independence of different measurements in a time series and the fact that that measurements are subject to some level of noise, both of which reduce the effective dimensionality and thereby constrain the complexity of models which can be built from the data. 相似文献

3.

Use of principal component analysis and the GE-biplot for the graphical exploration of gene expression data

Pittelkow Y Wilson SR 《Biometrics》2005,61(2):630-2; discussion 632-4

This note is in response to Wouters et al. (2003, Biometrics 59, 1131-1139) who compared three methods for exploring gene expression data. Contrary to their summary that principal component analysis is not very informative, we show that it is possible to determine principal component analyses that are useful for exploratory analysis of microarray data. We also present another biplot representation, the GE-biplot (Gene Expression biplot), that is a useful method for exploring gene expression data with the major advantage of being able to aid interpretation of both the samples and the genes relative to each other. 相似文献

4.

Effective dimensionality for principal component analysis of time series expression data

Hörnquist M Hertz J Wahde M 《Bio Systems》2003,71(3):311-317

Large-scale expression data are today measured for thousands of genes simultaneously. This development has been followed by an exploration of theoretical tools to get as much information out of these data as possible. Several groups have used principal component analysis (PCA) for this task. However, since this approach is data-driven, care must be taken in order not to analyze the noise instead of the data. As a strong warning towards uncritical use of the output from a PCA, we employ a newly developed procedure to judge the effective dimensionality of a specific data set. Although this data set is obtained during the development of rat central nervous system, our finding is a general property of noisy time series data. Based on knowledge of the noise-level for the data, we find that the effective number of dimensions that are meaningful to use in a PCA is much lower than what could be expected from the number of measurements. We attribute this fact both to effects of noise and the lack of independence of the expression levels. Finally, we explore the possibility to increase the dimensionality by performing more measurements within one time series, and conclude that this is not a fruitful approach. 相似文献

5.

Principal component analysis for clustering gene expression data 总被引：15，自引：0，他引：15

Yeung KY Ruzzo WL 《Bioinformatics (Oxford, England)》2001,17(9):763-774

MOTIVATION: There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes. RESULTS: Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances. 相似文献

6.

Probabilistic principal component analysis for metabolomic data

Gift Nyamundanda Lorraine Brennan Isobel Claire Gormley 《BMC bioinformatics》2010,11(1):571

Background

Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model. 相似文献

7.

Incremental genetic K-means algorithm and its application in gene expression data analysis 总被引：1，自引：0，他引：1

Yi?Lu Shiyong?Lu Farshad?Fotouhi Youping?Deng Email author Susan?J?Brown 《BMC bioinformatics》2004,5(1):172

Background

In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data. 相似文献

8.

A principal component analysis program

P Capy 《The Journal of heredity》1985,76(5):401-402

相似文献

9.

Pre-processing of chromatographic data for principal component analysis

M. E. Pate N. F. Thornhill R. Chandwani M. Hoare N. J. Titchener-Hooker 《Bioprocess and biosystems engineering》1998,19(4):297-305

This paper examines the selection of the appropriate representation of chromatogram data prior to using principal component analysis (PCA), a multivariate statistical technique, for the diagnosis of chromatogram data sets. The effects of four process variables were investigated; flow rate, temperature, loading concentration and loading volume, for a size exclusion chromatography system used to separate three components (monomer, dimer, trimer). The study showed that major positional shifts in the elution peaks that result when running the separation at different flow rates caused the effects of other variables to be masked if the PCA is performed using elapsed time as the comparative basis. Two alternative methods of representing the data in chromatograms are proposed. In the first data were converted to a volumetric basis prior to performing the PCA, while in the second, having made this transformation the data were adjusted to account for the total material loaded during each separation. Two datasets were analysed to demonstrate the approaches. The results show that by appropriate selection of the basis prior to the analysis, significantly greater process insight can be gained from the PCA and demonstrates the importance of pre-processing prior to such analysis. 相似文献

10.

A web-based tool for principal component and significance analysis of microarray data 总被引：8，自引：0，他引：8

Sharov AA Dudekula DB Ko MS 《Bioinformatics (Oxford, England)》2005,21(10):2548-2549

We have developed a program for microarray data analysis, which features the false discovery rate for testing statistical significance and the principal component analysis using the singular value decomposition method for detecting the global trends of gene-expression patterns. Additional features include analysis of variance with multiple methods for error variance adjustment, correction of cross-channel correlation for two-color microarrays, identification of genes specific to each cluster of tissue samples, biplot of tissues and corresponding tissue-specific genes, clustering of genes that are correlated with each principal component (PC), three-dimensional graphics based on virtual reality modeling language and sharing of PC between different experiments. The software also supports parameter adjustment, gene search and graphical output of results. The software is implemented as a web tool and thus the speed of analysis does not depend on the power of a client computer. AVAILABILITY: The tool can be used on-line or downloaded at http://lgsun.grc.nia.nih.gov/ANOVA/ 相似文献

11.

An application of principal component analysis in genetics

V. Abeywardena 《Journal of genetics》1972,61(1):27-51

相似文献

12.

A simple approach to guide factor retention decisions when applying principal component analysis to biomechanical data

Steven L. Fischer Robin H. Hampton 《Computer methods in biomechanics and biomedical engineering》2014,17(3):199-203

The use of principal component analysis (PCA) as a multivariate statistical approach to reduce complex biomechanical data-sets is growing. With its increased application in biomechanics, there has been a concurrent divergence in the use of criteria to determine how much the data is reduced (i.e. how many principal factors are retained). This short communication presents power equations to support the use of a parallel analysis (PA) criterion as a quantitative and transparent method for determining how many factors to retain when conducting a PCA. Monte Carlo simulation was used to carry out PCA on random data-sets of varying dimension. This process mimicked the PA procedure that would be required to determine principal component (PC) retention for any independent study in which the data-set dimensions fell within the range tested here. A surface was plotted for each of the first eight PCs, expressing the expected outcome of a PA as a function of the dimensions of a data-set. A power relationship was used to fit the surface, facilitating the prediction of the expected outcome of a PA as a function of the dimensions of a data-set. Coefficients used to fit the surface and facilitate prediction are reported. These equations enable the PA to be freely adopted as a criterion to inform PC retention. A transparent and quantifiable criterion to determine how many PCs to retain will enhance the ability to compare and contrast between studies. 相似文献

13.

Integrated analysis of gene expression and copy number data on gene shaving using independent component analysis

Sheng J Deng HW Calhoun VD Wang YP 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(6):1568-1579

DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer. 相似文献

14.

Mining gene expression data by interpreting principal components

Joseph C Roden Brandon W King Diane Trout Ali Mortazavi Barbara J Wold Christopher E Hart 《BMC bioinformatics》2006,7(1):194-22

Background

There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis. 相似文献

15.

Clustering gene expression data with kernel principal components

Liu Z Chen D Bensmail H Xu Y 《Journal of bioinformatics and computational biology》2005,3(2):303-316

Kernel principal component analysis (KPCA) has been applied to data clustering and graphic cut in the last couple of years. This paper discusses the application of KPCA to microarray data clustering. A new algorithm based on KPCA and fuzzy C-means is proposed. Experiments with microarray data show that the proposed algorithms is in general superior to traditional algorithms. 相似文献

16.

Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data

Tan Y Shi L Tong W Wang C 《Nucleic acids research》2005,33(1):56-65

DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.). 相似文献

17.

The biplot graphic display of matrices with application to principal component analysis 总被引：30，自引：0，他引：30

GABRIEL K. R. 《Biometrika》1971,58(3):453-467

相似文献

18.

Catheter ablation outcome prediction in persistent atrial fibrillation using weighted principal component analysis

Marianna Meo Vicente Zarzoso Olivier Meste Decebal G. Latcu Nadir Saoudi 《Biomedical signal processing and control》2013,8(6):958-968

相似文献

19.

Beyond principal component analysis: Canonical component analysis for data reduction in classification of EPs

《International journal of bio-medical computing》1984,15(2):93-111

The authors tested a new procedure for the discrimination of EPs obtained in different stimulus situations. In contrast with principal component analysis (PCA) used so far for the purpose of data compression, the method referred to as canonical component analysis (CCA) is optimal for the purpose of discrimination. To illustrate this, the authors performed both PCA and CCA for the same material, then after carrying out discriminant analysis (SDWA) for the data transformed in this way, compared the performance of the two procedures in discrimination. In view of both the theoretical and practical considerations, the authors recommend that in the future researchers use CCA instead of PCA in EP studies for data reduction carried out for discrimination. 相似文献

20.

Normalization of single-channel DNA array data by principal component analysis

Stoyanova R Querec TD Brown TR Patriotis C 《Bioinformatics (Oxford, England)》2004,20(11):1772-1784

MOTIVATION: Detailed comparison and analysis of the output of DNA gene expression arrays from multiple samples require global normalization of the measured individual gene intensities from the different hybridizations. This is needed for accounting for variations in array preparation and sample hybridization conditions. RESULTS: Here, we present a simple, robust and accurate procedure for the global normalization of datasets generated with single-channel DNA arrays based on principal component analysis. The procedure makes minimal assumptions about the data and performs well in cases where other standard procedures produced biased estimates. It is also insensitive to data transformation, filtering (thresholding) and pre-screening. 相似文献