首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
宋莹  田心 《生物物理学报》2001,17(4):661-668
一些生理信号,例如脑电是源自于高维混沌系统,因此低维混沌理论和方法不适用于分析这类高维混沌。采用投影追踪主分量分析法(Princiopal Component Analysis based on Projection Pursuit,PP PCA)对高维Lorenz模型系统进行了降维的研究。在用上述方法成功地对线性和非线性噪声-周期模型分别进行了PP PCA分析的基础上,对Lorenz高维混沌系统进行了PPPCA降维的研究。结果表明,正确选用非线性的投影追踪主分量分析法,可以通过简化原系统达到降维的目的,并能保留研究所关心的原系统的主要动态特性。同时也阐明了方法的稳定性和将该方法应用于高维脑电降维的可行性。  相似文献   

2.
Principal component analysis (PCA) is a method of simplifying complex datasets to generate a lower number of parameters, while retaining the essential differences and allowing objective comparison of large numbers of datasets. Glycosaminoglycans (GAGs) are a class of linear sulfated carbohydrates with diverse sequences and consequent complex conformation and structure. Here, PCA is applied to three problems in GAG research: (i) distinguishing origins of heparin preparations, (ii) structural analysis of heparin derivatives, and (iii) classification of chondroitin sulfates (CS). The results revealed the following. (i) PCA of heparin (13)C NMR spectra allowed their origins to be distinguished and structural differences were identified. (ii) Analysis of the information-rich (1)H and (13)C NMR spectra of a series of systematically modified heparin derivatives uncovered underlying properties. These included the presence of interactions between residues, providing evidence that a degree of degeneracy exists in linkage geometry and that a different degree of variability exists for the two types of glycosidic linkage. The relative sensitivity of each position (C or H nucleus) in the disaccharide repeating unit to changes in O-, N-sulfation and N-acetylation was also revealed. (iii) Analysis of the (1)H NMR and CD spectra of a series of CS samples from different origins allowed their structural classification and highlighted the power of employing complementary spectroscopic methods in concert with PCA.  相似文献   

3.
Oşan R  Zhu L  Shoham S  Tsien JZ 《PloS one》2007,2(5):e404
Recent advances in large-scale ensemble recordings allow monitoring of activity patterns of several hundreds of neurons in freely behaving animals. The emergence of such high-dimensional datasets poses challenges for the identification and analysis of dynamical network patterns. While several types of multivariate statistical methods have been used for integrating responses from multiple neurons, their effectiveness in pattern classification and predictive power has not been compared in a direct and systematic manner. Here we systematically employed a series of projection methods, such as Multiple Discriminant Analysis (MDA), Principal Components Analysis (PCA) and Artificial Neural Networks (ANN), and compared them with non-projection multivariate statistical methods such as Multivariate Gaussian Distributions (MGD). Our analyses of hippocampal data recorded during episodic memory events and cortical data simulated during face perception or arm movements illustrate how low-dimensional encoding subspaces can reveal the existence of network-level ensemble representations. We show how the use of regularization methods can prevent these statistical methods from over-fitting of training data sets when the trial numbers are much smaller than the number of recorded units. Moreover, we investigated the extent to which the computations implemented by the projection methods reflect the underlying hierarchical properties of the neural populations. Based on their ability to extract the essential features for pattern classification, we conclude that the typical performance ranking of these methods on under-sampled neural data of large dimension is MDA>PCA>ANN>MGD.  相似文献   

4.

Background  

Principal component analysis (PCA) has gained popularity as a method for the analysis of high-dimensional genomic data. However, it is often difficult to interpret the results because the principal components are linear combinations of all variables, and the coefficients (loadings) are typically nonzero. These nonzero values also reflect poor estimation of the true vector loadings; for example, for gene expression data, biologically we expect only a portion of the genes to be expressed in any tissue, and an even smaller fraction to be involved in a particular process. Sparse PCA methods have recently been introduced for reducing the number of nonzero coefficients, but these existing methods are not satisfactory for high-dimensional data applications because they still give too many nonzero coefficients.  相似文献   

5.
MOTIVATION: Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis. RESULTS: We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to 'denoise' a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.  相似文献   

6.
The firing rates of individual neurons displaying mixed selectivity are modulated by multiple task variables. When mixed selectivity is nonlinear, it confers an advantage by generating a high-dimensional neural representation that can be flexibly decoded by linear classifiers. Although the advantages of this coding scheme are well accepted, the means of designing an experiment and analyzing the data to test for and characterize mixed selectivity remain unclear. With the growing number of large datasets collected during complex tasks, the mixed selectivity is increasingly observed and is challenging to interpret correctly. We review recent approaches for analyzing and interpreting neural datasets and clarify the theoretical implications of mixed selectivity in the variety of forms that have been reported in the literature. We also aim to provide a practical guide for determining whether a neural population has linear or nonlinear mixed selectivity and whether this mixing leads to a categorical or category-free representation.  相似文献   

7.

Background  

SpectraClassifier (SC) is a Java solution for designing and implementing Magnetic Resonance Spectroscopy (MRS)-based classifiers. The main goal of SC is to allow users with minimum background knowledge of multivariate statistics to perform a fully automated pattern recognition analysis. SC incorporates feature selection (greedy stepwise approach, either forward or backward), and feature extraction (PCA). Fisher Linear Discriminant Analysis is the method of choice for classification. Classifier evaluation is performed through various methods: display of the confusion matrix of the training and testing datasets; K-fold cross-validation, leave-one-out and bootstrapping as well as Receiver Operating Characteristic (ROC) curves.  相似文献   

8.
Robust PCA and classification in biosciences   总被引:7,自引:0,他引:7  
MOTIVATION: Principal components analysis (PCA) is a very popular dimension reduction technique that is widely used as a first step in the analysis of high-dimensional microarray data. However, the classical approach that is based on the mean and the sample covariance matrix of the data is very sensitive to outliers. Also, classification methods based on this covariance matrix do not give good results in the presence of outlying measurements. RESULTS: First, we propose a robust PCA (ROBPCA) method for high-dimensional data. It combines projection-pursuit ideas with robust estimation of low-dimensional data. We also propose a diagnostic plot to display and classify the outliers. This ROBPCA method is applied to several bio-chemical datasets. In one example, we also apply a robust discriminant method on the scores obtained with ROBPCA. We show that this combination of robust methods leads to better classifications than classical PCA and quadratic discriminant analysis. AVAILABILITY: All the programs are part of the Matlab Toolbox for Robust Calibration, available at http://www.wis.kuleuven.ac.be/stat/robust.html.  相似文献   

9.

Background  

With the rapid development of new genetic measurement methods, several types of genetic alterations can be quantified in a high-throughput manner. While the initial focus has been on investigating each data set separately, there is an increasing interest in studying the correlation structure between two or more data sets. Multivariate methods based on Canonical Correlation Analysis (CCA) have been proposed for integrating paired genetic data sets. The high dimensionality of microarray data imposes computational difficulties, which have been addressed for instance by studying the covariance structure of the data, or by reducing the number of variables prior to applying the CCA. In this work, we propose a new method for analyzing high-dimensional paired genetic data sets, which mainly emphasizes the correlation structure and still permits efficient application to very large data sets. The method is implemented by translating a regularized CCA to its dual form, where the computational complexity depends mainly on the number of samples instead of the number of variables. The optimal regularization parameters are chosen by cross-validation. We apply the regularized dual CCA, as well as a classical CCA preceded by a dimension-reducing Principal Components Analysis (PCA), to a paired data set of gene expression changes and copy number alterations in leukemia.  相似文献   

10.
Learning gene expression programs directly from a set of observations is challenging due to the complexity of gene regulation, high noise of experimental measurements, and insufficient number of experimental measurements. Imposing additional constraints with strong and biologically motivated regularizations is critical in developing reliable and effective algorithms for inferring gene expression programs. Here we propose a new form of regulation that constrains the number of independent connectivity patterns between regulators and targets, motivated by the modular design of gene regulatory programs and the belief that the total number of independent regulatory modules should be small. We formulate a multi-target linear regression framework to incorporate this type of regulation, in which the number of independent connectivity patterns is expressed as the rank of the connectivity matrix between regulators and targets. We then generalize the linear framework to nonlinear cases, and prove that the generalized low-rank regularization model is still convex. Efficient algorithms are derived to solve both the linear and nonlinear low-rank regularized problems. Finally, we test the algorithms on three gene expression datasets, and show that the low-rank regularization improves the accuracy of gene expression prediction in these three datasets.  相似文献   

11.

Background  

Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming.  相似文献   

12.
基于流形学习的基因表达谱数据可视化   总被引:2,自引:0,他引:2  
基因表达谱的可视化本质上是高维数据的降维问题。采用流形学习算法来解决基因表达谱的降维数据可视化,讨论了典型的流形学习算法(Isomap和LLE)在表达谱降维中的适用性。通过类内/类间距离定量评价数据降维的效果,对两个典型基因芯片数据集(结肠癌基因表达谱数据集和急性白血病基因表达谱数据集)进行降维分析,发现两个数据集的本征维数都低于3,因而可以用流形学习方法在低维投影空间中进行可视化。与传统的降维方法(如PCA和MDS)的投影结果作比较,显示Isomap流形学习方法有更好的可视化效果。  相似文献   

13.
Stochastic ICA contrast maximisation using OJA's nonlinear PCA algorithm   总被引:1,自引:0,他引:1  
Independent Component Analysis (ICA) is an important extension of linear Principal Component Analysis (PCA). PCA performs a data transformation to provide independence to second order, that is, decorrelation. ICA transforms data to provide approximate independence up to and beyond second order yielding transformed data with fully factorable probability densities. The linear ICA transformation has been applied to the classical statistical signal-processing problem of Blind Separation of Sources (BSS), that is, separating unknown original source signals from a mixture whose mode of mixing is undetermined. In this paper it is shown that Oja's Nonlinear PCA algorithm performs a general stochastic online adaptive ICA. This analysis is corroborated with three simulations. The first separates unknown mixtures of original natural images, which have sub-Gaussian densities, the second separates linear mixtures of natural speech whose densities are super-Gaussian. Finally unknown mixtures of original images, which have both sub- and super-Gaussian densities are separated.  相似文献   

14.

Background  

Accurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables. Principal components analysis (PCA) is a linear dimensionality reduction (DR) method that is unsupervised in that it relies only on the data; projections are calculated in Euclidean or a similar linear space and do not use tuning parameters for optimizing the fit to the data. However, relationships within sets of nonlinear data types, such as biological networks or images, are frequently mis-rendered into a low dimensional space by linear methods. Nonlinear methods, in contrast, attempt to model important aspects of the underlying data structure, often requiring parameter(s) fitting to the data type of interest. In many cases, the optimal parameter values vary when different classification algorithms are applied on the same rendered subspace, making the results of such methods highly dependent upon the type of classifier implemented.  相似文献   

15.
MOTIVATION: Microarrays are capable of determining the expression levels of thousands of genes simultaneously. In combination with classification methods, this technology can be useful to support clinical management decisions for individual patients, e.g. in oncology. The aim of this paper is to systematically benchmark the role of non-linear versus linear techniques and dimensionality reduction methods. RESULTS: A systematic benchmarking study is performed by comparing linear versions of standard classification and dimensionality reduction techniques with their non-linear versions based on non-linear kernel functions with a radial basis function (RBF) kernel. A total of 9 binary cancer classification problems, derived from 7 publicly available microarray datasets, and 20 randomizations of each problem are examined. CONCLUSIONS: Three main conclusions can be formulated based on the performances on independent test sets. (1) When performing classification with least squares support vector machines (LS-SVMs) (without dimensionality reduction), RBF kernels can be used without risking too much overfitting. The results obtained with well-tuned RBF kernels are never worse and sometimes even statistically significantly better compared to results obtained with a linear kernel in terms of test set receiver operating characteristic and test set accuracy performances. (2) Even for classification with linear classifiers like LS-SVM with linear kernel, using regularization is very important. (3) When performing kernel principal component analysis (kernel PCA) before classification, using an RBF kernel for kernel PCA tends to result in overfitting, especially when using supervised feature selection. It has been observed that an optimal selection of a large number of features is often an indication for overfitting. Kernel PCA with linear kernel gives better results.  相似文献   

16.
Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ? 1 regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.  相似文献   

17.
MOTIVATION: Since DNA microarray experiments provide us with huge amount of gene expression data, they should be analyzed with statistical methods to extract the meanings of experimental results. Some dimensionality reduction methods such as Principal Component Analysis (PCA) are used to roughly visualize the distribution of high dimensional gene expression data. However, in the case of binary classification of gene expression data, PCA does not utilize class information when choosing axes. Thus clearly separable data in the original space may not be so in the reduced space used in PCA. RESULTS: For visualization and class prediction of gene expression data, we have developed a new SVM-based method called multidimensional SVMs, that generate multiple orthogonal axes. This method projects high dimensional data into lower dimensional space to exhibit properties of the data clearly and to visualize a distribution of the data roughly. Furthermore, the multiple axes can be used for class prediction. The basic properties of conventional SVMs are retained in our method: solutions of mathematical programming are sparse, and nonlinear classification is implemented implicitly through the use of kernel functions. The application of our method to the experimentally obtained gene expression datasets for patients' samples indicates that our algorithm is efficient and useful for visualization and class prediction. CONTACT: komura@hal.rcast.u-tokyo.ac.jp.  相似文献   

18.
High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA) in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, -nearest neighbor) on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang''s Breast Cancer, Gordon''s Lung Adenocarcinoma and Pomeroy''s Medulloblastoma.  相似文献   

19.

Background  

Analysis of heart rate variation (HRV) has become a popular noninvasive tool for assessing the activities of the autonomic nervous system (ANS). HRV analysis is based on the concept that fast fluctuations may specifically reflect changes of sympathetic and vagal activity. It shows that the structure generating the signal is not simply linear, but also involves nonlinear contributions. Linear parameters, Power spectral indice (LF/HF) is calculated with nonlinear indices Poincare plot geometry(SD1,SD2), Approximate Entropy (ApEn), Largest Lyapunov Exponent (LLE) and Detrended Fluctuation Analysis(DFA). The results show that, with aging the heart rate variability decreases. In this work, the ranges of all linear and nonlinear parameters for four age group normal subjects are presented with an accuracy of more than 89%.  相似文献   

20.
Least-squares methods for blind source separation based on nonlinear PCA   总被引:2,自引:0,他引:2  
In standard blind source separation, one tries to extract unknown source signals from their instantaneous linear mixtures by using a minimum of a priori information. We have recently shown that certain nonlinear extensions of principal component type neural algorithms can be successfully applied to this problem. In this paper, we show that a nonlinear PCA criterion can be minimized using least-squares approaches, leading to computationally efficient and fast converging algorithms. Several versions of this approach are developed and studied, some of which can be regarded as neural learning algorithms. A connection to the nonlinear PCA subspace rule is also shown. Experimental results are given, showing that the least-squares methods usually converge clearly faster than stochastic gradient algorithms in blind separation problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号