首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Raman spectra provide wealthy but complex information about the chemical constituents of biological samples. Digital processing techniques are usually needed to extract the spectra of chemical constituents and their associated concentration profiles. However, spectral signatures may admit transformations from those recorded on pure constituents and these techniques require a priori knowledge of spectra to be estimated. We propose in this study to analyse paraffin-embedded skin biopsies of malignant and benign tumors dedicated to oncology researches by Raman spectroscopy and advanced signal processing methods. We show that the commonly used principal component analysis (PCA) does not give physically interpretable estimators of spectra and associated concentration profiles. Based on a linear model and taking into account the statistical properties of spectra, independent component analysis (ICA) is used to better estimate the spectra of chemical constituents. The estimators of associated concentration profiles are no longer orthogonal and have only positive values, contrary to PCA. ICA allows to model the paraffin by three Raman spectra and provides good estimators of underlying spectra of the human skin, which is of great interest in oncology since the retrieval of spectral features of different types of skin tumors is sufficient for their discrimination.  相似文献   

2.
Interpreting the complex interplay of metabolites in heterogeneous biosamples still poses a challenging task. In this study, we propose independent component analysis (ICA) as a multivariate analysis tool for the interpretation of large-scale metabolomics data. In particular, we employ a Bayesian ICA method based on a mean-field approach, which allows us to statistically infer the number of independent components to be reconstructed. The advantage of ICA over correlation-based methods like principal component analysis (PCA) is the utilization of higher order statistical dependencies, which not only yield additional information but also allow a more meaningful representation of the data with fewer components. We performed the described ICA approach on a large-scale metabolomics data set of human serum samples, comprising a total of 1764 study probands with 218 measured metabolites. Inspecting the source matrix of statistically independent metabolite profiles using a weighted enrichment algorithm, we observe strong enrichment of specific metabolic pathways in all components. This includes signatures from amino acid metabolism, energy-related processes, carbohydrate metabolism, and lipid metabolism. Our results imply that the human blood metabolome is composed of a distinct set of overlaying, statistically independent signals. ICA furthermore produces a mixing matrix, describing the strength of each independent component for each of the study probands. Correlating these values with plasma high-density lipoprotein (HDL) levels, we establish a novel association between HDL plasma levels and the branched-chain amino acid pathway. We conclude that the Bayesian ICA methodology has the power and flexibility to replace many of the nowadays common PCA and clustering-based analyses common in the research field.  相似文献   

3.
Independent component analysis (ICA) and blind source separation (BSS) methods are increasingly used to separate individual brain and non-brain source signals mixed by volume conduction in electroencephalographic (EEG) and other electrophysiological recordings. We compared results of decomposing thirteen 71-channel human scalp EEG datasets by 22 ICA and BSS algorithms, assessing the pairwise mutual information (PMI) in scalp channel pairs, the remaining PMI in component pairs, the overall mutual information reduction (MIR) effected by each decomposition, and decomposition 'dipolarity' defined as the number of component scalp maps matching the projection of a single equivalent dipole with less than a given residual variance. The least well-performing algorithm was principal component analysis (PCA); best performing were AMICA and other likelihood/mutual information based ICA methods. Though these and other commonly-used decomposition methods returned many similar components, across 18 ICA/BSS algorithms mean dipolarity varied linearly with both MIR and with PMI remaining between the resulting component time courses, a result compatible with an interpretation of many maximally independent EEG components as being volume-conducted projections of partially-synchronous local cortical field activity within single compact cortical domains. To encourage further method comparisons, the data and software used to prepare the results have been made available (http://sccn.ucsd.edu/wiki/BSSComparison).  相似文献   

4.
In this paper, we propose and implement a hybrid model combining two-directional two-dimensional principal component analysis ((2D)2PCA) and a Radial Basis Function Neural Network (RBFNN) to forecast stock market behavior. First, 36 stock market technical variables are selected as the input features, and a sliding window is used to obtain the input data of the model. Next, (2D)2PCA is utilized to reduce the dimension of the data and extract its intrinsic features. Finally, an RBFNN accepts the data processed by (2D)2PCA to forecast the next day''s stock price or movement. The proposed model is used on the Shanghai stock market index, and the experiments show that the model achieves a good level of fitness. The proposed model is then compared with one that uses the traditional dimension reduction method principal component analysis (PCA) and independent component analysis (ICA). The empirical results show that the proposed model outperforms the PCA-based model, as well as alternative models based on ICA and on the multilayer perceptron.  相似文献   

5.
6.
The purpose of this study is to examine whether or not the application of independent component analysis (ICA) is useful for separation of motor unit action potential trains (MUAPTs) from the multi-channel surface EMG (sEMG) signals. In this study, the eight-channel sEMG signals were recorded from tibialis anterior muscles during isometric dorsi-flexions at 5%, 10%, 15% and 20% maximal voluntary contraction. Recording MUAP waveforms with little time delay mounted between the channels were obtained by vertical sEMG channel arrangements to muscle fibers. The independent components estimated by FastICA were compared with the sEMG signals and the principal components calculated by principal component analysis (PCA). From our results, it was shown that FastICA could separate groups of similar MUAP waveforms of the sEMG signals separated into each independent component while PCA could not sufficiently separate the groups into the principal components. A greater reduction of interferences between different MUAP waveforms was demonstrated by the use of FastICA. Therefore, it is suggested that FastICA could provide much better discrimination of the properties of MUAPTs for sEMG signal decomposition, i.e. waveforms, discharge intervals, etc., than not only PCA but also the original sEMG signals.  相似文献   

7.
In humans and other mammals, sperm morphology has been considered one of the most important predictive parameters of fertility. The objective was to determine the presence and distribution of sperm head morphometric subpopulations in a nonhuman primate model (Callithrix jacchus), using an objective computer analysis system and principal component analysis (PCA) methods to establish the relationship between the subpopulation distribution observed and among-donor variation. The PCA method revealed a stable number of principal components in all donors studied, that represented more than 85% of the cumulative variance in all cases. After cluster analysis, a variable number (from three to seven) sperm morphometric subpopulations were identified with defined sperm dimensions and shapes. There were differences in the distribution of the sperm morphometric subpopulations (P < 0.001) in all ejaculates among the four donors analyzed. In conclusion, in this study, computerized sperm analysis methods combined with PCA cluster analyses were useful to identify, classify, and characterize various head sperm morphometric subpopulations in nonhuman primates, yielding considerable biological information. In addition, because all individuals were kept in the same conditions, differences in the distribution of these subpopulations were not attributed to external or management factors. Finally, the substantial information derived from subpopulation analyses provided new and relevant biological knowledge which may have a practical use for future studies in human and nonhuman primate ejaculates, including identifying individuals more suitable for assisted reproductive technologies.  相似文献   

8.
An improved method for deconvoluting complex spectral maps from bidimensional fluorescence monitoring is presented, relying on a combination of principal component analysis (PCA) and feedforward artificial neural networks (ANN). With the aim of reducing ANN complexity, spectral maps are first subjected to PCA, and the scores of the retained principal components are subsequently used as ANN input vector. The method is presented using the case study of an extractive membrane biofilm reactor, where fluorescence maps of a membrane-attached biofilm were analysed, which were collected under different reactor operating conditions. During ANN training, the spectral information is associated with process performance indicators. Originally, 231 excitation/emission pairs per fluorescence map were used as ANN input vector. Using PCA, each fluorescence map could be represented by a maximum of six principal components, thereby catching 99.5% of its variance. As a result, the dimension of the ANN input vector and hence the complexity of the artificial neural network was significantly reduced, and ANN training speed was increased. Correlations between principal components and ANN predicted process performance parameters were good with correlation coefficients in the order of 0.7 or higher.  相似文献   

9.
M Crescenzi  A Giuliani 《FEBS letters》2001,507(1):114-118
By using principal components analysis (PCA) we demonstrate here that the information relevant to tumor line classification linked to the activity of 1375 genes expressed in 60 tumor cell lines can be reproduced by only five independent components. These components can be interpreted as cell motility and migration, cellular trafficking and endo/exocytosis, and epithelial character. PCA, at odds with cluster analysis methods routinely used in microarray analysis, allows for the participation of individual genes to multiple biochemical pathways, while assigning to each cell line a quantitative score reflecting fundamental biological functions.  相似文献   

10.
A new computational procedure to resolve the contribution of Photosystem I (PSI) and Photosystem II (PSII) to the leaf chlorophyll fluorescence emission spectra at room temperature has been developed. It is based on the Principal Component Analysis (PCA) of the leaf fluorescence emission spectra measured during the OI photochemical phase of fluorescence induction kinetics. During this phase, we can assume that only two spectral components are present, one of which is constant (PSI) and the other variable in intensity (PSII). Application of the PCA method to the measured fluorescence emission spectra of Ficus benjamina L. evidences that the temporal variation in the spectra can be ascribed to a single spectral component (the first principal component extracted by PCA), which can be considered to be a good approximation of the PSII fluorescence emission spectrum. The PSI fluorescence emission spectrum was deduced by difference between measured spectra and the first principal component. A single-band spectrum for the PSI fluorescence emission, peaked at about 735?nm, and a 2-band spectrum with maxima at 685 and 740?nm for the PSII were obtained. A linear combination of only these two spectral shapes produced a good fit for any measured emission spectrum of the leaf under investigation and can be used to obtain the fluorescence emission contributions of photosystems under different conditions. With the use of our approach, the dynamics of energy distribution between the two photosystems, such as state transition, can be monitored in vivo, directly at physiological temperatures. Separation of the PSI and PSII emission components can improve the understanding of the fluorescence signal changes induced by environmental factors or stress conditions on plants.  相似文献   

11.
When whole-cell extracts are analyzed, proton nuclear magnetic resonance (1H NMR) spectroscopy provides biochemical profiles that contain overlapping signals of the majority of the compounds. To determine whether cyanobacteria could be taxonomically discriminated on the basis of metabolic fingerprinting, we subjected whole-cell extracts of the cyanobacteria to1H NMR. The1H NMR spectra revealed a predominance of signals in the aliphatic region. Principal component analysis (PCA) of the data then enabled discrimination of the cyanobacteria. The hierarchical dendrogram, based on PCA of the aliphatic region data, showed that six cyanobacterial taxa were discriminated from two eukaryotic microalgal species, and that the six taxa could be subsequently divided into three groups. This agrees with the current taxonomy of cyanobacteria. Therefore, our overall results indicate that metabolic fingerprinting using1H NMR spectra and multivariate statistical analysis provide a simple, rapid method for the taxonomical discrimination of cyanobacteria.  相似文献   

12.
Gene expression datasets are large and complex, having many variables and unknown internal structure. We apply independent component analysis (ICA) to derive a less redundant representation of the expression data. The decomposition produces components with minimal statistical dependence and reveals biologically relevant information. Consequently, to the transformed data, we apply cluster analysis (an important and popular analysis tool for obtaining an initial understanding of the data, usually employed for class discovery). The proposed self-organizing map (SOM)-based clustering algorithm automatically determines the number of 'natural' subgroups of the data, being aided at this task by the available prior knowledge of the functional categories of genes. An entropy criterion allows each gene to be assigned to multiple classes, which is closer to the biological representation. These features, however, are not achieved at the cost of the simplicity of the algorithm, since the map grows on a simple grid structure and the learning algorithm remains equal to Kohonen's one.  相似文献   

13.
The development of fast and effective spectroscopic methods that can detect most compounds in an untargeted manner is of increasing interest in plant extracts fingerprinting or profiling projects. Metabolite fingerprinting by nuclear magnetic resonance (NMR) is a fast growing field which is increasingly applied for quality control of herbal products, mostly via 1D 1H NMR coupled to multivariate data analysis. Nevertheless, signal overlap is a common problem in 1H NMR profiles that hinders metabolites identification and results in incomplete data interpretation. Herein, we introduce a novel approach in coupling 2D NMR datasets with principal component analysis (PCA) exemplified for hop resin classification. Heteronuclear multiple bond correlation (HMBC) profile maps of hop resins (Humulus lupulus) were generated for a comparative study of 13 hop cultivars. The method described herein combines reproducible metabolite fingerprints with a minimal sample preparation effort and an experimental time of ca. 28 min per sample, comparable to that of a standard HPLC run. Moreover, HMBC spectra provide not only unequivocal assignment of hop major secondary metabolites, but also allow to identify several isomerization and degradation products of hop bitter acids including the sedative principal of hop (2-methylbut-3-en-2-ol). We do believe that combining 2D NMR datasets to chemometrics, i.e. PCA, has great potential for application in other plant metabolome projects of (commercially relevant) nutraceuticals and or herbal drugs.  相似文献   

14.
Codon substitution models have traditionally been parametric Markov models, but recently, empirical and semiempirical models also have been proposed. Parametric codon models are typically based on 61×61 rate matrices that are derived from a small number of parameters. These parameters are rooted in experience and theoretical considerations and generally show good performance but are still relatively arbitrary. We have previously used principal component analysis (PCA) on data obtained from mammalian sequence alignments to empirically identify the most relevant parameters for codon substitution models, thereby confirming some commonly used parameters but also suggesting new ones. Here, we present a new semiempirical codon substitution model that is directly based on those PCA results. The substitution rate matrix is constructed from linear combinations of the first few (the most important) principal components with the coefficients being free model parameters. Thus, the model is not only based on empirical rates but also uses the empirically determined most relevant parameters for a codon model to adjust to the particularities of individual data sets. In comparisons against established parametric and semiempirical models, the new model consistently achieves the highest likelihood values when applied to sequences of vertebrates, which include the taxonomic class where the model was trained on.  相似文献   

15.
刘仁林 《生物多样性》1994,2(3):173-176
本文运用主分量分析的方法,分析了江西森林自然保护区的自然属性的地理分布规律,旨在从中找到一组可靠的自然属性,以此预测建立森林自然保护区的合适之地,并对已建立的5个省级自然保护区进行分布合理性评价。研究表明:前三维主分量的信息量占总信息量的87.73%,降维效果良好。通过对不同自然属性在前三维主分量上的负荷量以及它们之间的离散性分析,确定了12个自然属性是评价和预测自然保护区的有效因子。从而证明 PCA 在自然保护区的规划和研究中具有良好的适应性。  相似文献   

16.
GWAS has facilitated greatly the discovery of risk SNPs associated with complex diseases. Traditional methods analyze SNP individually and are limited by low power and reproducibility since correction for multiple comparisons is necessary. Several methods have been proposed based on grouping SNPs into SNP sets using biological knowledge and/or genomic features. In this article, we compare the linear kernel machine based test (LKM) and principal components analysis based approach (PCA) using simulated datasets under the scenarios of 0 to 3 causal SNPs, as well as simple and complex linkage disequilibrium (LD) structures of the simulated regions. Our simulation study demonstrates that both LKM and PCA can control the type I error at the significance level of 0.05. If the causal SNP is in strong LD with the genotyped SNPs, both the PCA with a small number of principal components (PCs) and the LKM with kernel of linear or identical-by-state function are valid tests. However, if the LD structure is complex, such as several LD blocks in the SNP set, or when the causal SNP is not in the LD block in which most of the genotyped SNPs reside, more PCs should be included to capture the information of the causal SNP. Simulation studies also demonstrate the ability of LKM and PCA to combine information from multiple causal SNPs and to provide increased power over individual SNP analysis. We also apply LKM and PCA to analyze two SNP sets extracted from an actual GWAS dataset on non-small cell lung cancer.  相似文献   

17.

Background  

Modeling of gene expression data from time course experiments often involves the use of linear models such as those obtained from principal component analysis (PCA), independent component analysis (ICA), or other methods. Such methods do not generally yield factors with a clear biological interpretation. Moreover, implicit assumptions about the measurement errors often limit the application of these methods to log-transformed data, destroying linear structure in the untransformed expression data.  相似文献   

18.
In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements. Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent 'non-standard' applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCA with estimating equations as opposed to gene expressions. Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The functional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.  相似文献   

19.
Principal component analysis for clustering gene expression data   总被引:15,自引:0,他引:15  
MOTIVATION: There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes. RESULTS: Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances.  相似文献   

20.
Principal Component Analysis (PCA) is a classical technique in statistical data analysis, feature extraction and data reduction, aiming at explaining observed signals as a linear combination of orthogonal principal components. Independent Component Analysis (ICA) is a technique of array processing and data analysis, aiming at recovering unobserved signals or 'sources' from observed mixtures, exploiting only the assumption of mutual independence between the signals. The separation of the sources by ICA has great potential in applications such as the separation of sound signals (like voices mixed in simultaneous multiple records, for example), in telecommunication or in the treatment of medical signals. However, ICA is not yet often used by statisticians. In this paper, we shall present ICA in a statistical framework and compare this method with PCA for electroencephalograms (EEG) analysis.We shall see that ICA provides a more useful data representation than PCA, for instance, for the representation of a particular characteristic of the EEG named event-related potential (ERP).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号