首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Direct kernels, due to LAUDER (1983), as an alternative to the indirect kernel method in discriminant analysis are considered. It is shown that direct kernels may be based on any kernel function known in discrete density estimation. The choice of smoothing parameters is based on general loss functions and a family of loss functions which are specific for the discrimination problem is introduced. Examples with distance dependent and distance independent smoothing parameters are given to illustrate the applicability.  相似文献   

2.
This work presents a novel and extensive investigation of mathematical regression techniques, for the prediction of laboratory-type kinematic measurements during human gait, from wearable measurement devices, such as gyroscopes and accelerometers. Specifically, we examine the hypothesis of predicting the segmental angles of the legs (left and right foot, shank and thighs), from rotational foot velocities and translational foot accelerations. This first investigation is based on kinematic data emulated from motion-capture laboratory equipment. We employ eight established regression algorithms with different properties, ranging from linear methods and neural networks with polynomial support and expanded nonlinearities, to radial basis functions, nearest neighbors and kernel density methods. Data from five gait cycles of eight subjects are used to perform both inter-subject and intra-subject assessments of the prediction capabilities of each algorithm, using cross-validation resampling methods. Regarding the algorithmic suitability to gait prediction, results strongly indicate that nonparametric methods, such as nearest neighbors and kernel density based, are particularly advantageous. Numerical results show high average prediction accuracy (rho = 0.98/0.99, RMS = 5.63 degrees/2.30 degrees, MAD = 4.43 degrees/1.52 degrees for inter/intra-subject testing). The presented work provides a promising and motivating investigation on the feasibility of cost-effective wearable devices used to acquire large volumes of data that are currently collected only from complex laboratory environments.  相似文献   

3.
Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.  相似文献   

4.
MOTIVATION: Microarrays are capable of determining the expression levels of thousands of genes simultaneously. In combination with classification methods, this technology can be useful to support clinical management decisions for individual patients, e.g. in oncology. The aim of this paper is to systematically benchmark the role of non-linear versus linear techniques and dimensionality reduction methods. RESULTS: A systematic benchmarking study is performed by comparing linear versions of standard classification and dimensionality reduction techniques with their non-linear versions based on non-linear kernel functions with a radial basis function (RBF) kernel. A total of 9 binary cancer classification problems, derived from 7 publicly available microarray datasets, and 20 randomizations of each problem are examined. CONCLUSIONS: Three main conclusions can be formulated based on the performances on independent test sets. (1) When performing classification with least squares support vector machines (LS-SVMs) (without dimensionality reduction), RBF kernels can be used without risking too much overfitting. The results obtained with well-tuned RBF kernels are never worse and sometimes even statistically significantly better compared to results obtained with a linear kernel in terms of test set receiver operating characteristic and test set accuracy performances. (2) Even for classification with linear classifiers like LS-SVM with linear kernel, using regularization is very important. (3) When performing kernel principal component analysis (kernel PCA) before classification, using an RBF kernel for kernel PCA tends to result in overfitting, especially when using supervised feature selection. It has been observed that an optimal selection of a large number of features is often an indication for overfitting. Kernel PCA with linear kernel gives better results.  相似文献   

5.
MOTIVATION: Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS: We introduce two classes of kernel functions that are constructed by combining sequence profiles with new and existing approaches for determining the similarity between pairs of protein sequences. These kernels are constructed directly from these explicit protein similarity measures and employ effective profile-to-profile scoring schemes for measuring the similarity between pairs of proteins. Experiments with remote homology detection and fold recognition problems show that these kernels are capable of producing results that are substantially better than those produced by all of the existing state-of-the-art SVM-based methods. In addition, the experiments show that these kernels, even when used in the absence of profiles, produce results that are better than those produced by existing non-profile-based schemes. AVAILABILITY: The programs for computing the various kernel functions are available on request from the authors.  相似文献   

6.
In 3D single particle reconstruction, which involves the translational and rotational matching of a large number of electron microscopy (EM) images, the algorithmic performance is largely dependent on the efficiency and accuracy of the underlying 2D image alignment kernel. We present a novel fast rotational matching kernel for 2D images (FRM2D) that significantly reduces the cost of this alignment. The alignment problem is formulated using one translational and two rotational degrees of freedom. This allows us to take advantage of fast Fourier transforms (FFTs) in rotational space to accelerate the search of the two angular parameters, while the remaining translational parameter is explored, within a limited range, by exhaustive search. Since there are no boundary effects in FFTs of cyclic angular variables, we avoid the expensive zero padding associated with Fourier transforms in linear space. To verify the robustness of our method, efficiency and accuracy tests were carried out over a range of noise levels in realistic simulations of EM images. Performance tests against two standard alignment methods, resampling to polar coordinates and self-correlation, demonstrate that FRM2D compares very favorably to the traditional methods. FRM2D exhibits a comparable or higher robustness against noise and a significant gain in efficiency that depends on the fineness of the angular sampling and linear search range.  相似文献   

7.
Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes.  相似文献   

8.
Using time-domain correlation techniques, the first- and second-order Wiener kernels have been calculated for the system mediating the human visual evoked response. The first-order kernels indicate the linear element is a resonant one, with a natural frequency near 20 Hz, and a memory of approximately 250 ms. The transport delay associated with this element is approximately 56 ms. The second-order kernels indicate a quadratic nonlinear element with a memory less than 20 ms. The analytic form of this element can be approximated by a parabola shifted to the right of the origin. A close correspondance between the spectrum of the first-order kernel and the spectrum of the main diagonal of the second-order kernel suggests the nonlinear element preceeds the linear one. Tests of reproducibility on the first-order kernel and the main diagonal of the second-order kernel suggest they are reliable describing functions for the system mediating the human visual evoked response.  相似文献   

9.
A new type of learning algorithms with the supervisor for estimating multidimensional functions is considered. These methods based on Support Vector Machines are widely used due to their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data. Support vector machines and related kernel methods are extremely good at solving prediction problems in computational biology. A background about statistical learning theory and kernel feature spaces is given including practical and algorithmic considerations.  相似文献   

10.
Saccade and smooth pursuit are two important functions of human eye.In order to enable bionic eye to imitate the two functions,a control method that implements saccade and smooth pursuit based on the three-dimensional coordinates of target is proposed.An optimal observation position is defined for bionic eye based on three-dimensional coordinates.A kind of motion planning method with high accuracy is developed.The motion parameters of stepper motor consisting of angle acceleration and turning time are computed according to the position deviation,the target's angular velocity and the stepper motor's current angular velocity in motion planning.The motors are controlled with the motion parameters moving to given position with desired angular velocity in schedule time.The experimental results show that the bionic eye can move to optimal observation positions in 0.6 s from initial location and the accuracy of 3D coordinates is improved.In addition,the bionic eye can track a target within the error of less than 20 pixels based on three-dimensional coordinates.It is verified that saccade and smooth pursuit of bionic eye based on three-dimensional coordinates are feasible.  相似文献   

11.
Kernel bandwidth optimization in spike rate estimation   总被引:1,自引:0,他引:1  
Kernel smoother and a time-histogram are classical tools for estimating an instantaneous rate of spike occurrences. We recently established a method for selecting the bin width of the time-histogram, based on the principle of minimizing the mean integrated square error (MISE) between the estimated rate and unknown underlying rate. Here we apply the same optimization principle to the kernel density estimation in selecting the width or “bandwidth” of the kernel, and further extend the algorithm to allow a variable bandwidth, in conformity with data. The variable kernel has the potential to accurately grasp non-stationary phenomena, such as abrupt changes in the firing rate, which we often encounter in neuroscience. In order to avoid possible overfitting that may take place due to excessive freedom, we introduced a stiffness constant for bandwidth variability. Our method automatically adjusts the stiffness constant, thereby adapting to the entire set of spike data. It is revealed that the classical kernel smoother may exhibit goodness-of-fit comparable to, or even better than, that of modern sophisticated rate estimation methods, provided that the bandwidth is selected properly for a given set of spike data, according to the optimization methods presented here.  相似文献   

12.
13.
Contingent kernel density estimation   总被引:1,自引:0,他引:1  
Kernel density estimation is a widely used method for estimating a distribution based on a sample of points drawn from that distribution. Generally, in practice some form of error contaminates the sample of observed points. Such error can be the result of imprecise measurements or observation bias. Often this error is negligible and may be disregarded in analysis. In cases where the error is non-negligible, estimation methods should be adjusted to reduce resulting bias. Several modifications of kernel density estimation have been developed to address specific forms of errors. One form of error that has not yet been addressed is the case where observations are nominally placed at the centers of areas from which the points are assumed to have been drawn, where these areas are of varying sizes. In this scenario, the bias arises because the size of the error can vary among points and some subset of points can be known to have smaller error than another subset or the form of the error may change among points. This paper proposes a "contingent kernel density estimation" technique to address this form of error. This new technique adjusts the standard kernel on a point-by-point basis in an adaptive response to changing structure and magnitude of error. In this paper, equations for our contingent kernel technique are derived, the technique is validated using numerical simulations, and an example using the geographic locations of social networking users is worked to demonstrate the utility of the method.  相似文献   

14.
A data-smoothing filter has been developed that permits the improvement in accuracy of individual elements of a bivariate flow cytometry (FCM) histogram by making use of data from adjacent elements, a knowledge of the two-dimensional measurement system point spread function (PSF), and the local count density. For FCM data, the PSF is assumed to be a set of two-dimensional Gaussian functions with a constant coefficient of variation for each axis. A set of space variant smoothing kernels are developed from the basic PSF by adjusting the orthogonal standard deviations of each Gaussian smoothing kernel according to the local count density. This adjustment in kernel size matches the degree of smoothing to the local reliability of the data. When the count density is high, a small kernel is sufficient. When the density is low, however, a broader kernel should be used. The local count density is taken from a region defined by the measurement PSF. The smoothing algorithm permits the reduction in statistical fluctuations present in bivariate FCM histograms due to the low count densities often encountered in some elements. This reduction in high-frequency spatial noise aids in the visual interpretation of the data. Additionally, by making more efficient use of smaller samples, systematic errors due to system drift may be minimized.  相似文献   

15.
Recovery rate is essential to the estimation of the portfolio’s loss and economic capital. Neglecting the randomness of the distribution of recovery rate may underestimate the risk. The study introduces two kinds of models of distribution, Beta distribution estimation and kernel density distribution estimation, to simulate the distribution of recovery rates of corporate loans and bonds. As is known, models based on Beta distribution are common in daily usage, such as CreditMetrics by J.P. Morgan, Portfolio Manager by KMV and Losscalc by Moody’s. However, it has a fatal defect that it can’t fit the bimodal or multimodal distributions such as recovery rates of corporate loans and bonds as Moody’s new data show. In order to overcome this flaw, the kernel density estimation is introduced and we compare the simulation results by histogram, Beta distribution estimation and kernel density estimation to reach the conclusion that the Gaussian kernel density distribution really better imitates the distribution of the bimodal or multimodal data samples of corporate loans and bonds. Finally, a Chi-square test of the Gaussian kernel density estimation proves that it can fit the curve of recovery rates of loans and bonds. So using the kernel density distribution to precisely delineate the bimodal recovery rates of bonds is optimal in credit risk management.  相似文献   

16.
We study the problem of estimating the density of a random variable G, given observations of a random variable Y = G + E. The random variable E is independent of G and its probability distribution function is considered as known. We build a family of estimators of the density of G using characteristic functions. We then derive a family of estimators of the density of Y based on the model for Y. The estimators are shown to be asymptotically unbiased and consistent. Simulations show that these estimators are better, as measured by integrated squared error, than the standard kernel estimators. Finally, we give an example of the use of this method for the detection of major genes in animal populations.  相似文献   

17.
Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.  相似文献   

18.
We describe a new method for estimating the area of home ranges and constructing utilization distributions (UDs) from spatial data. We compare our method with bivariate kernel and α-hull methods, using both randomly distributed and highly aggregated data to test the accuracy of area estimates and UD isopleth construction. The data variously contain holes, corners, and corridors linking high use areas. Our method is based on taking the union of the minimum convex polygons (MCP) associated with the k−1 nearest neighbors of each point in the data and, as such, has one free parameter k. We propose a "minimum spurious hole covering" (MSHC) rule for selecting k and interpret its application in terms of type I and type II statistical errors. Our MSHC rule provides estimates within 12% of true area values for all 5 data sets, while kernel methods are worse in all cases: in one case overestimating area by a factor of 10 and in another case underestimating area by a factor of 50. Our method also constructs much better estimates for the density isopleths of the UDs than kernel methods. The α-hull method does not lead directly to the construction of isopleths and also does not always include all points in the constructed home range. Finally we demonstrate that kernel methods, unlike our method and the α-hull method, does not converges to the true area represented by the data as the number of data points increase.  相似文献   

19.
两种珍稀植物群落物种多度分布的核方法研究   总被引:25,自引:3,他引:22  
首次提出物种多度分布的非参数核密度估计方法,介绍了此方法的构造和主要性质。珍稀濒危植物观光木群落和长苞铁杉群落的乔木层、灌木层、所有木本植物物种多度分布实例拟合结果表明,核方法能很好地描述群落物种多度分布。非参数核估计方法是群落物种多度分布模拟的一种有效方法,它丰富了物种多度分布拟合方法,为珍稀濒危植物的管理与保护提供了理论参考。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号