首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.  相似文献   

2.
Selecting an appropriate variable subset in linear multivariate methods is an important methodological issue for ecologists. Interest often exists in obtaining general predictive capacity or in finding causal inferences from predictor variables. Because of a lack of solid knowledge on a studied phenomenon, scientists explore predictor variables in order to find the most meaningful (i.e. discriminating) ones. As an example, we modelled the response of the amphibious softwater plant Eleocharis multicaulis using canonical discriminant function analysis. We asked how variables can be selected through comparison of several methods: univariate Pearson chi-square screening, principal components analysis (PCA) and step-wise analysis, as well as combinations of some methods. We expected PCA to perform best. The selected methods were evaluated through fit and stability of the resulting discriminant functions and through correlations between these functions and the predictor variables. The chi-square subset, at P < 0.05, followed by a step-wise sub-selection, gave the best results. In contrast to expectations, PCA performed poorly, as so did step-wise analysis. The different chi-square subset methods all yielded ecologically meaningful variables, while probable noise variables were also selected by PCA and step-wise analysis. We advise against the simple use of PCA or step-wise discriminant analysis to obtain an ecologically meaningful variable subset; the former because it does not take into account the response variable, the latter because noise variables are likely to be selected. We suggest that univariate screening techniques are a worthwhile alternative for variable selection in ecology.  相似文献   

3.
小波和主分量分析方法研究思维脑电   总被引:4,自引:0,他引:4  
研究自发脑电和思维活动的关系.利用小波和主分量分析结合的WPCA算法对不同思维任务记录的六导脑电进行处理,并对思维特征的频谱能量和变化率等多指标进行综合分析和计算。结果表明WPCA算法不仅可以实现噪声的去除,而且能提高主分量的贡献率,降低输入矢量的维数。对脑电主分量的分析揭示了脑电与思维个体、思维种类、复杂度以及注意力的联系,思维任务的神经网络分类结果验证了WPCA方法研究脑电和思维的有效性,为进一步理解认知和思维过程,实现对思维的定位和分类提供了依据。  相似文献   

4.
Hidden Markov modeling (HMM) can be applied to extract single channel kinetics at signal-to-noise ratios that are too low for conventional analysis. There are two general HMM approaches: traditional Baum's reestimation and direct optimization. The optimization approach has the advantage that it optimizes the rate constants directly. This allows setting constraints on the rate constants, fitting multiple data sets across different experimental conditions, and handling nonstationary channels where the starting probability of the channel depends on the unknown kinetics. We present here an extension of this approach that addresses the additional issues of low-pass filtering and correlated noise. The filtering is modeled using a finite impulse response (FIR) filter applied to the underlying signal, and the noise correlation is accounted for using an autoregressive (AR) process. In addition to correlated background noise, the algorithm allows for excess open channel noise that can be white or correlated. To maximize the efficiency of the algorithm, we derive the analytical derivatives of the likelihood function with respect to all unknown model parameters. The search of the likelihood space is performed using a variable metric method. Extension of the algorithm to data containing multiple channels is described. Examples are presented that demonstrate the applicability and effectiveness of the algorithm. Practical issues such as the selection of appropriate noise AR orders are also discussed through examples.  相似文献   

5.
We propose a new particle swarm optimization algorithm for problems where objective functions are subject to zero-mean, independent, and identically distributed stochastic noise. While particle swarm optimization has been successfully applied to solve many complex deterministic nonlinear optimization problems, straightforward applications of particle swarm optimization to noisy optimization problems are subject to failure because the noise in objective function values can lead the algorithm to incorrectly identify positions as the global/personal best positions. Instead of having the entire swarm follow a global best position based on the sample average of objective function values, the proposed new algorithm works with a set of statistically global best positions that include one or more positions with objective function values that are statistically equivalent, which is achieved using a combination of statistical subset selection and clustering analysis. The new PSO algorithm can be seamlessly integrated with adaptive resampling procedures to enhance the capability of PSO to cope with noisy objective functions. Numerical experiments demonstrate that the new algorithm is able to consistently find better solutions than the canonical particle swarm optimization algorithm in the presence of stochastic noise in objective function values with different resampling procedures.  相似文献   

6.
Principal Component Analysis (PCA) and Principal Subspace Analysis (PSA) are classic techniques in statistical data analysis, feature extraction and data compression. Given a set of multivariate measurements, PCA and PSA provide a smaller set of "basis vectors" with less redundancy, and a subspace spanned by them, respectively. Artificial neurons and neural networks have been shown to perform PSA and PCA when gradient ascent (descent) learning rules are used, which is related to the constrained maximization (minimization) of statistical objective functions. Due to their low complexity, such algorithms and their implementation in neural networks are potentially useful in cases of tracking slow changes of correlations in the input data or in updating eigenvectors with new samples. In this paper we propose PCA learning algorithm that is fully homogeneous with respect to neurons. The algorithm is obtained by modification of one of the most famous PSA learning algorithms--Subspace Learning Algorithm (SLA). Modification of the algorithm is based on Time-Oriented Hierarchical Method (TOHM). The method uses two distinct time scales. On a faster time scale PSA algorithm is responsible for the "behavior" of all output neurons. On a slower scale, output neurons will compete for fulfillment of their "own interests". On this scale, basis vectors in the principal subspace are rotated toward the principal eigenvectors. At the end of the paper it will be briefly analyzed how (or why) time-oriented hierarchical method can be used for transformation of any of the existing neural network PSA method, into PCA method.  相似文献   

7.
Stuart L  Walter M  Borisyuk R 《Bio Systems》2002,67(1-3):265-279
The gravity transform algorithm is used to study the dependencies in firing of multi-dimensional spike trains. The pros and cons of this algorithm are discussed and the necessity for improved representation of output data is demonstrated. Parallel coordinates are introduced to visualise the results of the gravity transform and principal component analysis (PCA) is used to reduce the quantity of data represented whilst minimising loss of information.  相似文献   

8.
MOTIVATION: The accumulation of genomic alterations is an important process in tumor formation and progression. Comparative genomic hybridization performed on cDNA arrays (cDNA aCGH) is a common method to investigate the genomic alterations on a genome-wide scale. However, when detecting low-level DNA copy number changes this technology requires the use of noise reduction strategies due to a low signal to noise ratio. RESULTS: Currently a running average smoothing filter is the most frequently used noise reduction strategy. We analyzed this strategy theoretically and experimentally and found that it is not sensitive to very low level genomic alterations. The presence of systematic errors in the data is one of the main reasons for this failure. We developed a novel algorithm which efficiently reduces systematic noise and allows for the detection of low-level genomic alterations. The algorithm is based on comparison of the biological relevant data to data from so-called self-self hybridizations, additional experiments which contain no biological information but contain systematic errors. We find that with our algorithm the effective resolution for +/-1 DNA copy number changes is about 2 Mb. For copy number changes larger than three the effective resolution is on the level of single genes.  相似文献   

9.
While plants of a single species emit a diversity of volatile organic compounds (VOCs) to attract or repel interacting organisms, these specific messages may be lost in the midst of the hundreds of VOCs produced by sympatric plants of different species, many of which may have no signal content. Receivers must be able to reduce the babel or noise in these VOCs in order to correctly identify the message. For chemical ecologists faced with vast amounts of data on volatile signatures of plants in different ecological contexts, it is imperative to employ accurate methods of classifying messages, so that suitable bioassays may then be designed to understand message content. We demonstrate the utility of ‘Random Forests’ (RF), a machine‐learning algorithm, for the task of classifying volatile signatures and choosing the minimum set of volatiles for accurate discrimination, using data from sympatric Ficus species as a case study. We demonstrate the advantages of RF over conventional classification methods such as principal component analysis (PCA), as well as data‐mining algorithms such as support vector machines (SVM), diagonal linear discriminant analysis (DLDA) and k‐nearest neighbour (KNN) analysis. We show why a tree‐building method such as RF, which is increasingly being used by the bioinformatics, food technology and medical community, is particularly advantageous for the study of plant communication using volatiles, dealing, as it must, with abundant noise.  相似文献   

10.
Hörnquist M  Hertz J  Wahde M 《Bio Systems》2003,71(3):311-317
Large-scale expression data are today measured for thousands of genes simultaneously. This development has been followed by an exploration of theoretical tools to get as much information out of these data as possible. Several groups have used principal component analysis (PCA) for this task. However, since this approach is data-driven, care must be taken in order not to analyze the noise instead of the data. As a strong warning towards uncritical use of the output from a PCA, we employ a newly developed procedure to judge the effective dimensionality of a specific data set. Although this data set is obtained during the development of rat central nervous system, our finding is a general property of noisy time series data. Based on knowledge of the noise-level for the data, we find that the effective number of dimensions that are meaningful to use in a PCA is much lower than what could be expected from the number of measurements. We attribute this fact both to effects of noise and the lack of independence of the expression levels. Finally, we explore the possibility to increase the dimensionality by performing more measurements within one time series, and conclude that this is not a fruitful approach.  相似文献   

11.
Structural refinement of predicted models of biological macromolecules using atomistic or coarse‐grained molecular force fields having various degree of error is investigated. The goal of this analysis is to estimate what is the probability for designing an effective structural refinement based on computations of conformational energies using force field, and starting from a structure predicted from the sequence (using template‐based or template‐free modeling), and refining it to bring the structure into closer proximity to the native state. It is widely believed that it should be possible to develop such a successful structure refinement algorithm by applying an iterative procedure with stochastic sampling and appropriate energy function, which assesses the quality (correctness) of protein decoys. Here, an analysis of noise in an artificially introduced scoring function is investigated for a model of an ideal sampling scheme, where the underlying distribution of RMSDs is assumed to be Gaussian. Sampling of the conformational space is performed by random generation of RMSD values. We demonstrate that whenever the random noise in a force field exceeds some level, it is impossible to obtain reliable structural refinement. The magnitude of the noise, above which a structural refinement, on average is impossible, depends strongly on the quality of sampling scheme and a size of the protein. Finally, possible strategies to overcome the intrinsic limitations in the force fields for impacting the development of successful refinement algorithms are discussed. Proteins 2012. © 2011 Wiley Periodicals, Inc.  相似文献   

12.
Principal component analysis (PCA) has been applied to a fed-batch fermentation for the production of streptokinase to identify the variables which are essential to formulate an adequate model. To mimic an industrial situation, Gaussian noise was introduced in the feed rate of the substrate. Both in the presence and in the absence of noise, the same five variables out of seven were selected by PCA. The minimal model trained separately without and with noise was able to predict satisfactorily the course of the fermentation for a condition not employed in training. These observations attest the suitability of PCA to formulate minimal models for industrial scale fermentations.  相似文献   

13.
HIV virulence, i.e. the time of progression to AIDS, varies greatly among patients. As for other rapidly evolving pathogens of humans, it is difficult to know if this variance is controlled by the genotype of the host or that of the virus because the transmission chain is usually unknown. We apply the phylogenetic comparative approach (PCA) to estimate the heritability of a trait from one infection to the next, which indicates the control of the virus genotype over this trait. The idea is to use viral RNA sequences obtained from patients infected by HIV-1 subtype B to build a phylogeny, which approximately reflects the transmission chain. Heritability is measured statistically as the propensity for patients close in the phylogeny to exhibit similar infection trait values. The approach reveals that up to half of the variance in set-point viral load, a trait associated with virulence, can be heritable. Our estimate is significant and robust to noise in the phylogeny. We also check for the consistency of our approach by showing that a trait related to drug resistance is almost entirely heritable. Finally, we show the importance of taking into account the transmission chain when estimating correlations between infection traits. The fact that HIV virulence is, at least partially, heritable from one infection to the next has clinical and epidemiological implications. The difference between earlier studies and ours comes from the quality of our dataset and from the power of the PCA, which can be applied to large datasets and accounts for within-host evolution. The PCA opens new perspectives for approaches linking clinical data and evolutionary biology because it can be extended to study other traits or other infectious diseases.  相似文献   

14.
In certain image acquisitions processes, like in fluorescence microscopy or astronomy, only a limited number of photons can be collected due to various physical constraints. The resulting images suffer from signal dependent noise, which can be modeled as a Poisson distribution, and a low signal-to-noise ratio. However, the majority of research on noise reduction algorithms focuses on signal independent Gaussian noise. In this paper, we model noise as a combination of Poisson and Gaussian probability distributions to construct a more accurate model and adopt the contourlet transform which provides a sparse representation of the directional components in images. We also apply hidden Markov models with a framework that neatly describes the spatial and interscale dependencies which are the properties of transformation coefficients of natural images. In this paper, an effective denoising algorithm for Poisson-Gaussian noise is proposed using the contourlet transform, hidden Markov models and noise estimation in the transform domain. We supplement the algorithm by cycle spinning and Wiener filtering for further improvements. We finally show experimental results with simulations and fluorescence microscopy images which demonstrate the improved performance of the proposed approach.  相似文献   

15.
Sequence analysis of large protein families can produce sub-clusters even within the same family. In some cases, it is of interest to know precisely which amino acid position variations are most responsible for driving separation into sub-clusters. In large protein families composed of large proteins, it can be quite challenging to assign the relative importance to specific amino acid positions. Principal components analysis (PCA) is ideal for such a task, since the problem is posed in a large variable space, i.e. the number of amino acids that make up the protein sequence, and PCA is powerful at reducing the dimensionality of complex problems by projecting the data into an eigenspace that represents the directions of greatest variation. However, PCA of aligned protein sequence families is complicated by the fact that protein sequences are traditionally represented by single letter alphabetic codes, whereas PCA of protein sequence families requires conversion of sequence information into a numerical representation. Here, we introduce a new amino acid sequence conversion algorithm optimized for PCA data input. The method is demonstrated using a small artificial dataset to illustrate the characteristics and performance of the algorithm, as well as a small protein sequence family consisting of nine members, COG2263, and finally with a large protein sequence family, Pfam04237, which contains more than 1,800 sequences that group into two sub-clusters.  相似文献   

16.
Population stratification is a problem in genetic association studies because it is likely to highlight loci that underlie the population structure rather than disease-related loci. At present, principal component analysis(PCA) has been proven to be an effective way to correct for population stratification. However, the conventional PCA algorithm is time-consuming when dealing with large datasets. We developed a Graphic processing unit(GPU)-based PCA software named SHEsis PCA(http://analysis.bio-x.cn/SHEsis Main.htm) that is highly parallel with a highest speedup greater than 100 compared with its CPU version. A cluster algorithm based on X-means was also implemented as a way to detect population subgroups and to obtain matched cases and controls in order to reduce the genomic inflation and increase the power. A study of both simulated and real datasets showed that SHEsis PCA ran at an extremely high speed while the accuracy was hardly reduced. Therefore, SHEsis PCA can help correct for population stratification much more efficiently than the conventional CPU-based algorithms.  相似文献   

17.
In electron tomography the reconstructed density function is typically corrupted by noise and artifacts. Under those conditions, separating the meaningful regions of the reconstructed density function is not trivial. Despite development efforts that specifically target electron tomography manual segmentation continues to be the preferred method. Based on previous good experiences using a segmentation based on fuzzy logic principles (fuzzy segmentation) where the reconstructed density functions also have low signal-to-noise ratio, we applied it to electron tomographic reconstructions. We demonstrate the usefulness of the fuzzy segmentation algorithm evaluating it within the limits of segmenting electron tomograms of selectively stained, plastic embedded spiny dendrites. The results produced by the fuzzy segmentation algorithm within the framework presented are encouraging.  相似文献   

18.
噪声广泛存在于人和动物的生活环境中,从无脊椎动物到哺乳动物乃至人类,都会受到噪声的负面影响.强烈的噪声会损伤听觉系统的结构和功能,引起噪声性听力损失(noise-induced hearing loss,NIHL).本文对噪声性听力损失的类型、影响因素、噪声所致不同程度听力损失形成的可能机制进行了总结,发现NIHL主要与突触结构肿胀、谷氨酸引起的可逆兴奋性中毒以及活性氧引起的氧化应激、细胞凋亡、带状体损伤、α激动型鸟嘌呤核苷酸结合蛋白(guanine nucleotide binding protein alpha stimulating,GNAS)基因的mRNA及其上游lncRNA Sept7的表达量上调等因素有关.比较噪声暴露后不同物种听力损失情况的差异,发现鱼类和鸟类由于具有毛细胞再生能力而能够较快从听力损伤中恢复,啮齿类较容易受到噪声影响,而回声定位鲸类噪声暴露后的暂时性听觉阈移较小,非常有趣的是回声定位蝙蝠在噪声高强度暴露后未表现出暂时性听觉阈移的现象.上述结论提示,对不同物种的比较生理研究可深入揭示NIHL机制,并为听力保护以及噪声所致的听力损伤后修复等提供理论参考.  相似文献   

19.
This article describes the application of a change-point algorithm to the analysis of stochastic signals in biological systems whose underlying state dynamics consist of transitions between discrete states. Applications of this analysis include molecular-motor stepping, fluorophore bleaching, electrophysiology, particle and cell tracking, detection of copy number variation by sequencing, tethered-particle motion, etc. We present a unified approach to the analysis of processes whose noise can be modeled by Gaussian, Wiener, or Ornstein-Uhlenbeck processes. To fit the model, we exploit explicit, closed-form algebraic expressions for maximum-likelihood estimators of model parameters and estimated information loss of the generalized noise model, which can be computed extremely efficiently. We implement change-point detection using the frequentist information criterion (which, to our knowledge, is a new information criterion). The frequentist information criterion specifies a single, information-based statistical test that is free from ad hoc parameters and requires no prior probability distribution. We demonstrate this information-based approach in the analysis of simulated and experimental tethered-particle-motion data.  相似文献   

20.
视网膜是层状结构,临床上可以根据视网膜层厚度改变对一些疾病进行预测和诊断.为了快速且准确地分割出视网膜的不同层带,本论文提出一种基于主成分分析的随机森林视网膜光学相干断层扫描技术(optical coherence tomography,OCT)图像分层算法.该方法使用主成分分析(principal component analysis,PCA)法对随机森林采集到的特征进行重采样,保留重采样后权重大的特征信息维度,从而消除特征维度间的关联性和信息冗余.结果表明,总特征维度在29维的情况下,保留前18维度训练速度提高了23.20%,14维度训练速度提高了42.38%,而对图像分割精度方面影响较小,实验表明该方法有效地提高了算法的效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号