首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Pulsed laser-induced autofluorescence spectroscopic studies of pathologically certified normal, premalignant, and malignant oral tissues were carried out at 325 nm excitation. The spectral analysis and classification for discrimination among normal, premalignant, and malignant conditions were performed using principal component analysis (PCA) and artificial neural network (ANN) separately on the same set of spectral data. In case of PCA, spectral residuals, Mahalanobis distance, and scores of factors were used for discrimination among normal, premalignant, and malignant cases. In ANN, parameters like mean, spectral residual, standard deviation, and total energy were used to train the network. The ANN used in this study is a classical multiplayer feed-forward type with a back-propagation algorithm for the training of the network. The specificity and sensitivity were determined in both classification schemes. In the case of PCA, they are 100 and 92.9%, respectively, whereas for ANN they are 100 and 96.5% for the data set considered.  相似文献   

2.
In optimizations the dimension of the problem may severely, sometimes exponentially increase optimization time. Parametric function approximatiors (FAPPs) have been suggested to overcome this problem. Here, a novel FAPP, cost component analysis (CCA) is described. In CCA, the search space is resampled according to the Boltzmann distribution generated by the energy landscape. That is, CCA converts the optimization problem to density estimation. Structure of the induced density is searched by independent component analysis (ICA). The advantage of CCA is that each independent ICA component can be optimized separately. In turn, (i) CCA intends to partition the original problem into subproblems and (ii) separating (partitioning) the original optimization problem into subproblems may serve interpretation. Most importantly, (iii) CCA may give rise to high gains in optimization time. Numerical simulations illustrate the working of the algorithm.  相似文献   

3.
Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ? 1 regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.  相似文献   

4.
The purpose of this study is to examine whether or not the application of independent component analysis (ICA) is useful for separation of motor unit action potential trains (MUAPTs) from the multi-channel surface EMG (sEMG) signals. In this study, the eight-channel sEMG signals were recorded from tibialis anterior muscles during isometric dorsi-flexions at 5%, 10%, 15% and 20% maximal voluntary contraction. Recording MUAP waveforms with little time delay mounted between the channels were obtained by vertical sEMG channel arrangements to muscle fibers. The independent components estimated by FastICA were compared with the sEMG signals and the principal components calculated by principal component analysis (PCA). From our results, it was shown that FastICA could separate groups of similar MUAP waveforms of the sEMG signals separated into each independent component while PCA could not sufficiently separate the groups into the principal components. A greater reduction of interferences between different MUAP waveforms was demonstrated by the use of FastICA. Therefore, it is suggested that FastICA could provide much better discrimination of the properties of MUAPTs for sEMG signal decomposition, i.e. waveforms, discharge intervals, etc., than not only PCA but also the original sEMG signals.  相似文献   

5.
The complexity of biological processes often makes impractical the development of detailed, structured phenomenological models of the cultivation of microorganisms in bioreactors. In this context, data pre-treatment techniques are useful for bioprocess control and fault detection. Among them, principal component analysis (PCA) plays an important role. This work presents a case study of the application of this technique during real experiments, where the enzyme penicillin G acylase (PGA) was produced by Bacillus megaterium ATCC 14945. PGA hydrolyzes penicillin G to yield 6-aminopenicilanic acid (6-APA) and phenyl acetic acid. 6-APA is used to produce semi-synthetic β-lactam antibiotics. A static PCA algorithm was implemented for on-line detection of deviations from the desired process behavior. The experiments were carried out in a 2-L bioreactor. Hotteling’s T 2 was the discrimination criterion employed in this multivariable problem and the method showed a high sensibility for fault detection in all real cases that were studied.  相似文献   

6.
The work reported in this paper examines the use of principal component analysis (PCA), a technique of multivariate statistics to facilitate the extraction of meaningful diagnostic information from a data set of chromatographic traces. Two data sets mimicking archived production records were analysed using PCA. In the first a full-factorial experimental design approach was used to generate the data. In the second, the chromatograms were generated by adjusting just one of the process variables at a time. Data base mining was achieved through the generation of both gross and disjoint principal component (PC) models. PCA provided easily interpretable 2-dimensional diagnostic plots revealing clusters of chromatograms obtained under similar operating conditions. PCA methods can be used to detect and diagnose changes in process conditions, however results show that a PCA model may require recalibration if an equipment change is made. We conclude that PCA methods may be useful for the diagnosis of subtle deviations from process specification not readily distinguishable to the operator.  相似文献   

7.
In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements. Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent 'non-standard' applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCA with estimating equations as opposed to gene expressions. Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The functional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.  相似文献   

8.
Hörnquist M  Hertz J  Wahde M 《Bio Systems》2003,71(3):311-317
Large-scale expression data are today measured for thousands of genes simultaneously. This development has been followed by an exploration of theoretical tools to get as much information out of these data as possible. Several groups have used principal component analysis (PCA) for this task. However, since this approach is data-driven, care must be taken in order not to analyze the noise instead of the data. As a strong warning towards uncritical use of the output from a PCA, we employ a newly developed procedure to judge the effective dimensionality of a specific data set. Although this data set is obtained during the development of rat central nervous system, our finding is a general property of noisy time series data. Based on knowledge of the noise-level for the data, we find that the effective number of dimensions that are meaningful to use in a PCA is much lower than what could be expected from the number of measurements. We attribute this fact both to effects of noise and the lack of independence of the expression levels. Finally, we explore the possibility to increase the dimensionality by performing more measurements within one time series, and conclude that this is not a fruitful approach.  相似文献   

9.
Principal component analysis for clustering gene expression data   总被引:15,自引:0,他引:15  
MOTIVATION: There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes. RESULTS: Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances.  相似文献   

10.
The paper presents an application of principal component analysis (PCA) to ECG processing. For this purpose the ECG beats are time-aligned and stored in the columns of an auxiliary matrix. The matrix, considered as a set of multidimensional variables, undergoes PCA. Reconstruction of the respective columns on the basis of a low dimensional principal subspace leads to the enhancement of the stored ECG beats. A few modifications of this classical approach to ECG signal filtering by means of a multivariate analysis are introduced. The first one is based on replacing the classical PCA by its robust extension. The second consists in replacing the analysis of the whole synchronized beats by the analysis of shorter signal segments. This creates the background for the third modification, which introduces the concept of variable dimensions of the subspaces corresponding to different parts of ECG beats. The experiments performed show that introduction of the respective modifications significantly improves the classical approach to ECG processing by application of principal component analysis.  相似文献   

11.
Principal component analysis (PCA) is a one-group method. Its purpose is to transform correlated variables into uncorrelated ones and to find linear combinations accounting for a relatively large amount of the total variability, thus reducing the number of original variables to a few components only.
In the simultaneous analysis of different groups, similarities between the principal component structures can often be modelled by the methods of common principal components (CPCs) or partial CPCs. These methods assume that either all components or only some of them are common to all groups, the discrepancies being due mainly to sampling error.
Previous authors have dealt with the k-group situation either by pooling the data of all groups, or by pooling the within-group variance-covariance matrices before performing a PCA. The latter technique is known as multiple group principal component analysis or MGPCA (Thorpe, 1983a). We argue that CPC- or partial CPC-analysis is often more appropriate than these previous methods.
A morphometrical example using males and females of Microtus californicus and M. ochrogaster is presented, comparing PCA, CPC and partial CPC analyses. It is shown that the new methods yield estimated components having smaller standard errors than when groupwise analyses are performed. Formulas are given for estimating standard errors of the eigenvalues and eigenvectors, as well as for computing the likelihood ratio statistic used to test the appropriateness of the CPC- or partial CPC-model.  相似文献   

12.
This paper examines the selection of the appropriate representation of chromatogram data prior to using principal component analysis (PCA), a multivariate statistical technique, for the diagnosis of chromatogram data sets. The effects of four process variables were investigated; flow rate, temperature, loading concentration and loading volume, for a size exclusion chromatography system used to separate three components (monomer, dimer, trimer). The study showed that major positional shifts in the elution peaks that result when running the separation at different flow rates caused the effects of other variables to be masked if the PCA is performed using elapsed time as the comparative basis. Two alternative methods of representing the data in chromatograms are proposed. In the first data were converted to a volumetric basis prior to performing the PCA, while in the second, having made this transformation the data were adjusted to account for the total material loaded during each separation. Two datasets were analysed to demonstrate the approaches. The results show that by appropriate selection of the basis prior to the analysis, significantly greater process insight can be gained from the PCA and demonstrates the importance of pre-processing prior to such analysis.  相似文献   

13.
Principal Component Analysis (PCA) is a classical technique in statistical data analysis, feature extraction and data reduction, aiming at explaining observed signals as a linear combination of orthogonal principal components. Independent Component Analysis (ICA) is a technique of array processing and data analysis, aiming at recovering unobserved signals or 'sources' from observed mixtures, exploiting only the assumption of mutual independence between the signals. The separation of the sources by ICA has great potential in applications such as the separation of sound signals (like voices mixed in simultaneous multiple records, for example), in telecommunication or in the treatment of medical signals. However, ICA is not yet often used by statisticians. In this paper, we shall present ICA in a statistical framework and compare this method with PCA for electroencephalograms (EEG) analysis.We shall see that ICA provides a more useful data representation than PCA, for instance, for the representation of a particular characteristic of the EEG named event-related potential (ERP).  相似文献   

14.
In this study, we propose to use the principal component analysis (PCA) and regression model to incorporate linkage disequilibrium (LD) in genomic association data analysis. To accommodate LD in genomic data and reduce multiple testing, we suggest performing PCA and extracting the PCA score to capture the variation of genomic data, after which regression analysis is used to assess the association of the disease with the principal component score. An empirical analysis result shows that both genotype-basod correlation matrix and haplotype-based LD matrix can produce similar results for PCA. Principal component score seems to be more powerful in detecting genetic association because the principal component score is quantitatively measured and may be able to capture the effect of multiple loci.  相似文献   

15.

Background  

Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model.  相似文献   

16.
The recent development of new gamma imagers based on scintillation array with high spatial resolution, has strongly improved the possibility of detecting sub-centimeter cancer in Scintimammography. However, Compton scattering contamination remains the main drawback since it limits the sensitivity of tumor detection. Principal component image analysis (PCA), recently introduced in scintimam nographic imaging, is a data reduction technique able to represent the radiation emitted from chest, breast healthy and damaged tissues as separated images. From these images a Scintimammography can be obtained where the Compton contamination is “removed”. In the present paper we compared the PCA reconstructed images with the conventional scintimammographic images resulting from the photopeak (Ph) energy window. Data coming from a clinical trial were used. For both kinds of images the tumor presence was quantified by evaluating the t-student statistics for independent sample as a measure of the signal-to-noise ratio (SNR). Since the absence of Compton scattering, the PCA reconstructed images shows a better noise suppression and allows a more reliable diagnostics in comparison with the images obtained by the photopeak energy window, reducing the trend in producing false positive.  相似文献   

17.
Skjaerven L  Martinez A  Reuter N 《Proteins》2011,79(1):232-243
Principal component analysis (PCA) and normal mode analysis (NMA) have emerged as two invaluable tools for studying conformational changes in proteins. To compare these approaches for studying protein dynamics, we have used a subunit of the GroEL chaperone, whose dynamics is well characterized. We first show that both PCA on trajectories from molecular dynamics (MD) simulations and NMA reveal a general dynamical behavior in agreement with what has previously been described for GroEL. We thus compare the reproducibility of PCA on independent MD runs and subsequently investigate the influence of the length of the MD simulations. We show that there is a relatively poor one-to-one correspondence between eigenvectors obtained from two independent runs and conclude that caution should be taken when analyzing principal components individually. We also observe that increasing the simulation length does not improve the agreement with the experimental structural difference. In fact, relatively short MD simulations are sufficient for this purpose. We observe a rapid convergence of the eigenvectors (after ca. 6 ns). Although there is not always a clear one-to-one correspondence, there is a qualitatively good agreement between the movements described by the first five modes obtained with the three different approaches; PCA, all-atoms NMA, and coarse-grained NMA. It is particularly interesting to relate this to the computational cost of the three methods. The results we obtain on the GroEL subunit contribute to the generalization of robust and reproducible strategies for the study of protein dynamics, using either NMA or PCA of trajectories from MD simulations.  相似文献   

18.
To determine whether pattern recognition based on metabolite fingerprinting for whole cell extracts can be used to discriminate cultivars metabolically, leaves and fruits of five commercial strawberry cultivars were subjected to Fourier transform infrared (FT-IR) spectroscopy. FT-IR spectral data from leaves were analyzed by principal component analysis (PCA) and Fisher’s linear discriminant function analysis. The dendrogram based on hierarchical clustering analysis of these spectral data separated the five commercial cultivars into two major groups with originality. The first group consisted of Korean cultivars including ‘Maehyang’, ‘Seolhyang’, and ‘Gumhyang’, whereas in the second group, ‘Ryukbo’ clustered with ‘Janghee’, both Japanese cultivars. The results from analysis of fruits were the same as of leaves. We therefore conclude that the hierarchical dendrogram based on PCA of FT-IR data from leaves represents the most probable chemotaxonomical relationship between cultivars, enabling discrimination of cultivars in a rapid and simple manner. The authors Suk Weon Kim and Sung Ran Min contributed equally to this work.  相似文献   

19.
Raman spectra provide wealthy but complex information about the chemical constituents of biological samples. Digital processing techniques are usually needed to extract the spectra of chemical constituents and their associated concentration profiles. However, spectral signatures may admit transformations from those recorded on pure constituents and these techniques require a priori knowledge of spectra to be estimated. We propose in this study to analyse paraffin-embedded skin biopsies of malignant and benign tumors dedicated to oncology researches by Raman spectroscopy and advanced signal processing methods. We show that the commonly used principal component analysis (PCA) does not give physically interpretable estimators of spectra and associated concentration profiles. Based on a linear model and taking into account the statistical properties of spectra, independent component analysis (ICA) is used to better estimate the spectra of chemical constituents. The estimators of associated concentration profiles are no longer orthogonal and have only positive values, contrary to PCA. ICA allows to model the paraffin by three Raman spectra and provides good estimators of underlying spectra of the human skin, which is of great interest in oncology since the retrieval of spectral features of different types of skin tumors is sufficient for their discrimination.  相似文献   

20.
Load carriage is a very common daily activity at home and in the workplace. Generally, the load is in the form of an external load carried by an individual, it could also be the excessive body mass carried by an overweight individual. To quantify the effects of carrying extra weight, whether in the form of an external load or excess body mass, motion capture data were generated for a diverse subject set. This consisted of twenty-three subjects generating one hundred fifteen trials for each loading condition. This study applied principal component analysis (PCA) to motion capture data in order to analyze the lower body gait patterns for four loading conditions: normal weight unloaded, normal weight loaded, overweight unloaded and overweight loaded.PCA has been shown to be a powerful tool for analyzing complex gait data. In this analysis, it is shown that in order to quantify the effects of external loads and/or for both normal weight and overweight subjects, the first principal component (PC1) is needed. For the work in this paper, PCs were generated from lower body joint angle data. The PC1 of the hip angle and PC1 of the ankle angle are shown to be an indicator of external load and BMI effects on temporal gait data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号