首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.  相似文献   

2.
Question: (i) How does former land use and land use intensity affect seed bank development during post‐agricultural succession? (ii) How does time since the last clear‐cut change seed bank composition during post‐clear‐cut succession? Methods: One data set was compiled per succession type using the following selection criteria: (i) the data set included a successional series, (ii) plots were located in mesotrophic forest plant communities and (iii) vegetation data were available. The post‐agricultural succession data set comprised 76 recent forest plots (eight studies); the post‐clear‐cut succession data set comprised 218 ancient forest plots (three studies). Each data set was analysed separately using either linear mixed models or generalized linear models, controlling for both environmental heterogeneity and variation between study locations. Results: In the post‐agricultural succession data set, land use and time significantly affected nearly all the studied seed bank characteristics. Seed banks on former arable land recovered poorly even after 150 year of restored forest cover, whereas moderate land use intensities (grasslands, heathlands) yielded more rapid seed bank recovery. Time was a significant determinant of all but two soil seed bank characteristics during post‐clear‐cut succession. Seed banks in managed ancient forest differed strongly in their characteristics compared to primary forest seed banks. Conclusions: Forest seed banks bear the marks of former land use and/or forest management and continue to do so for at least 150 years. Nevertheless, time since the last major disturbance, being either former land use or clear‐cutting, remains a significant determinant of the seed bank.  相似文献   

3.
With the changes in the nature and the society, risks will inevitably change. It implies that, with the passage of time, some historical data would be invalid for probabilistic risk analysis. In this paper, a model to acquire the valid data is suggested, which is based on the Mann- Kendall test to detect abrupt change-point on time series data. What's more, the typhoon risk analysis in Guangdong Province, China is used as a case study to show how to apply the model. The valid data of the intensities of typhoons and the related losses in the province for the probabilistic risk analysis is obtained from the data during the time from 1984 to 2012. Comparing with the results based on the set of invalid data and the set of all collected data, the assessed risk based on the valid data is more reliable, which could reflect the dynamics of the typhoon risk.  相似文献   

4.
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package.  相似文献   

5.
Ewing G  Nicholls G  Rodrigo A 《Genetics》2004,168(4):2407-2420
We present a Bayesian statistical inference approach for simultaneously estimating mutation rate, population sizes, and migration rates in an island-structured population, using temporal and spatial sequence data. Markov chain Monte Carlo is used to collect samples from the posterior probability distribution. We demonstrate that this chain implementation successfully reaches equilibrium and recovers truth for simulated data. A real HIV DNA sequence data set with two demes, semen and blood, is used as an example to demonstrate the method by fitting asymmetric migration rates and different population sizes. This data set exhibits a bimodal joint posterior distribution, with modes favoring different preferred migration directions. This full data set was subsequently split temporally for further analysis. Qualitative behavior of one subset was similar to the bimodal distribution observed with the full data set. The temporally split data showed significant differences in the posterior distributions and estimates of parameter values over time.  相似文献   

6.
Luque SP  Fried R 《PloS one》2011,6(1):e15850
Zero offset correction of diving depth measured by time-depth recorders is required to remove artifacts arising from temporal changes in accuracy of pressure transducers. Currently used methods for this procedure are in the proprietary software domain, where researchers cannot study it in sufficient detail, so they have little or no control over how their data were changed. GNU R package diveMove implements a procedure in the Free Software domain that consists of recursively smoothing and filtering the input time series using moving quantiles. This paper describes, demonstrates, and evaluates the proposed method by using a "perfect" data set, which is subsequently corrupted to provide input for the proposed procedure. The method is evaluated by comparing the corrected time series to the original, uncorrupted, data set from an Antarctic fur seal (Arctocephalus gazella Peters, 1875). The Root Mean Square Error of the corrected data set, relative to the "perfect" data set, was nearly identical to the magnitude of noise introduced into the latter. The method, thus, provides a flexible, reliable, and efficient mechanism to perform zero offset correction for analyses of diving behaviour. We illustrate applications of the method to data sets from four species with large differences in diving behaviour, measured using different sampling protocols and instrument characteristics.  相似文献   

7.
This article discusses the problem of scheduling a large set of parts on an FMS so as to minimize the total completion time. Here, the FMS consists of a set of parallel identical machines. Setup time is incurred whenever a machine switches from one type of part to another. The setup time may be large or small depending on whether or not the two part types belong to the same family. This article describes a fast heuristic for this scheduling problem and derives a lower bound on the optimal solution. In computational tests using random data and data from an IBM card test line, the heuristic archieves nearly optimal schedules.  相似文献   

8.
In this paper, we consider incomplete survival data: partly interval-censored failure time data where observed data include both exact and interval-censored observations on the survival time of interest. We present a class of generalized log-rank tests for this type of survival data and establish their asymptotic properties. The method is evaluated using simulation studies and illustrated by a set of real data from a diabetes study.  相似文献   

9.
Datta S  Satten GA  Datta S 《Biometrics》2000,56(3):841-847
In this paper, we present new nonparametric estimators of the stage-occupation probabilities in the three-stage irreversible illness-death model. These estimators use a fractional risk set and a reweighting approach and are valid under stage-dependent censoring. Using a simulated data set, we compare the behavior of our estimators with previously proposed estimators. We also apply our estimators to data on time to Pneumocystis pneumonia and death obtained from an AIDS cohort study.  相似文献   

10.
Functional data analysis techniques provide an alternative way of representing movement and movement variability as a function of time. In particular, the registration of functional data provides a local normalization of time functions. This normalization transforms a set of curves, records of repeated trials, yielding a new set of curves that only vary in terms of amplitude. Therefore, main events occur at the "same time" for all transformed curves and interesting features of individual recordings remain after averaging processes. This paper presents an application of the registration process to the analysis of the vertical forces exerted on the ground by both feet during the sit-to-stand movement. This movement is particularly interesting in functional evaluations related to balance control, lower extremity dysfunction or low-back pain.  相似文献   

11.
Recently, a complete set of data on the branching pattern of the cat's pulmonary arterial and venous trees and the elasticity of these blood vessels was obtained in our laboratory. Hence it becomes possible for the first time to perform a theoretical analysis of the blood flow in the lung of an animal based on a set of actual data on anatomy and elasticity. This paper presents an analysis of steady flow of blood in cat's lung. The effect of the vessel elasticity is embodied in the "fifth-power law" and the "sheet-flow" theory. The theory yields the pressure-flow relationship of the whole lung, the longitudinal pressure distribution, and the transit time of blood in the capillaries. These results are compared with available experimental data in the literature.  相似文献   

12.
Inertial sensors are now sufficiently small and lightweight to be used for the collection of large datasets of both humans and animals. However, processing of these large datasets requires a certain degree of automation to achieve realistic workloads. Hidden Markov models (HMMs) are widely used stochastic pattern recognition tools and enable classification of non-stationary data. Here we apply HMMs to identify and segment into strides, data collected from a trunk-mounted six degrees of freedom inertial sensor in galloping Thoroughbred racehorses. A data set comprising mixed gait sequences from seven horses was subdivided into training, cross-validation and independent test set. Manual gallop stride segmentations were created and used for training as well as for evaluating cross-validation and test set performance. On the test set, 91% of the strides were accurately detected to lie within +/- 40 ms (< 10% stride time) of the manually segmented stride starts. While the automated system did not miss any of the strides, it identified additional gallop strides at the beginning of the trials. In the light of increasing use of inertial sensors for ambulatory measurements in clinical settings, automated processing techniques will be required for efficient data processing to enable instantaneous decision making from large amounts of data. In this context, automation is essential to gain optimal benefits from the potentially increased statistical power associated with large numbers of strides that can be collected in a relatively short period of time. We propose the use of HMM-based classifiers since they are easy to implement. In the present study, consistent results across cross-validation and test set were achieved with limited training data.  相似文献   

13.
A prediction formula has been evolved for estimation of human endurance time from aerobic and anaerobic fraction of the total oxygen utilization. The derivation of the formula is based on the assumption that fractional change in endurance time varies directly as the fractional change in aerobic fraction in the same direction and varies as the fractional change in anaerobic fraction in the opposite direction. The validity of the prediction formula has been tested on two sets of data. The first set is consisting of 31 observations on 13 Indian subjects and a second set of data is consisting of 7 observations on one subject collected from literature. The multiple correlations for these sets of data were 0.9650 and 0.9996, respectively. These multiple correlations were highly significant (p < 0.001). It has been concluded that aerobic and anaerobic fractions of total oxygen utilization are significant predictors of human endurance time.  相似文献   

14.
gm: a practical tool for automating DNA sequence analysis   总被引:1,自引:0,他引:1  
The gm (gene modeler) program automates the identification ofcandidate genes in anonymous, genomic DNA sequence data, gmaccepts sequence data, organism-specific consensus matricesand codon asymmetry tables, and a set of parameters as input;it returns a set of models describing the structures of candidategenes in the sequence and a corresponding set of predicted aminoacid sequences as output, gm is implemented in C, and has beentested on Sun, VAX, Sequent, MIPS and Cray computers. It iscapable of analyzing sequences of several kilobases containingmulti-exon genes in >1 min execution time on a Sun 4/60. Received on December 4, 1989; accepted on February 28, 1990  相似文献   

15.
Real food chains are very rarely investigated since long data sequences are required. Typically, if we consider that an ecosystem evolves with a period corresponding to the time for maturation, possessing few dozen of cycles would require to count species over few centuries. One well known example of a long data set is the number of Canadian lynx furs caught by the Hudson Bay company between 1821 and 1935 as reported by Elton and Nicholson in 1942. In spite of the relative quality of the data set (10 undersampled cycles), two low-dimensional global models that settle to chaotic attractors were obtained. They are compared with an ad hoc 3D model which was proposed as a possible model for this data set. The two global models, which were estimated with no prior knowledge about the dynamics, can be considered as direct evidences of chaos in real ecosystems.  相似文献   

16.
A method based on Taylor series expansion for estimation of location parameters and variance components of non-linear mixed effects models was considered. An attractive property of the method is the opportunity for an easily implemented algorithm. Estimation of non-linear mixed effects models can be done by common methods for linear mixed effects models, and thus existing programs can be used after small modifications. The applicability of this algorithm in animal breeding was studied with simulation using a Gompertz function growth model in pigs. Two growth data sets were analyzed: a full set containing observations from the entire growing period, and a truncated time trajectory set containing animals slaughtered prematurely, which is common in pig breeding. The results from the 50 simulation replicates with full data set indicate that the linearization approach was capable of estimating the original parameters satisfactorily. However, estimation of the parameters related to adult weight becomes unstable in the case of a truncated data set.  相似文献   

17.
R T O'Neill  C W Chen 《Biometrics》1978,34(3):411-420
A statistical model jointly characterizing the onset and termination of treatment response of a subject over a fixed observed time period is presented. The model requires that the observations for each subject are made at a set of pre-selected time points during the observed time period. A useful index characterizing the probability of being in response is developed along with maximum likelihood estimates and variances. A likelihood ratio test is developed to simultaneously compare two treatment groups with respect to this index for all times. The proposed procedure is applied to a set of data from a clinical trial of two bronchodilator drugs from which our procedure is motivated.  相似文献   

18.
Here we develop a completely nonparametric method for comparing two groups on a set of longitudinal measurements. No assumptions are made about the form of the mean response function, the covariance structure or the distributional form of disturbances around the mean response function. The solution proposed here is based on the realization that every longitudinal data set can also be thought of as a collection of survival data sets where the events of interest are level crossings. The method for testing for differences in the longitudinal measurements then is as follows: for an arbitrarily large set of levels, for each subject determine the first time the subject has an upcrossing and a downcrossing for each level. For each level one then computes the log rank statistic and uses the maximum in absolute value of all these statistics as the test statistic. By permuting group labels we obtain a permutation test of the hypothesis that the joint distribution of the measurements over time does not depend on group membership. Simulations are performed to investigate the power and it is applied to the area that motivated the method-the analysis of microarrays. In this area small sample sizes, few time points and far too many genes to consider genuine gene level longitudinal modeling have created a need for a simple, model free test to screen for interesting features in the data.  相似文献   

19.
In a recent paper, we showed that the value of a nonlinear quantity computed from scalp electrode data was correlated with the time to a seizure in patients with temporal lobe epilepsy. In this paper we study the relationship between the linear and nonlinear content and analyses of the scalp data. We do this in two ways. First, using surrogate data methods, we show that there is important nonlinear structure in the scalp electrode data to which our methods are sensitive. Second, we study the behavior of some simple linear metrics on the same set of scalp data to see whether the nonlinear metrics contain additional information not carried by the linear measures. We find that, while the nonlinear measures are correlated with time to seizure, the linear measures are not, over the time scales we have defined. The linear and nonlinear measures are themselves apparently linearly correlated, but that correlation can be ascribed to the influence of a small set of outliers, associated with muscle artifact. A remaining, more subtle relation between the variance of the values of a nonlinear measure and the expectation value of a linear measure persists. Implications of our observations are discussed.  相似文献   

20.
MOTIVATION: Accurate time series for biological processes are difficult to estimate due to problems of synchronization, temporal sampling and rate heterogeneity. Methods are needed that can utilize multi-dimensional data, such as those resulting from DNA microarray experiments, in order to reconstruct time series from unordered or poorly ordered sets of observations. RESULTS: We present a set of algorithms for estimating temporal orderings from unordered sets of sample elements. The techniques we describe are based on modifications of a minimum-spanning tree calculated from a weighted, undirected graph. We demonstrate the efficacy of our approach by applying these techniques to an artificial data set as well as several gene expression data sets derived from DNA microarray experiments. In addition to estimating orderings, the techniques we describe also provide useful heuristics for assessing relevant properties of sample datasets such as noise and sampling intensity, and we show how a data structure called a PQ-tree can be used to represent uncertainty in a reconstructed ordering. AVAILABILITY: Academic implementations of the ordering algorithms are available as source code (in the programming language Python) on our web site, along with documentation on their use. The artificial 'jelly roll' data set upon which the algorithm was tested is also available from this web site. The publicly available gene expression data may be found at http://genome-www.stanford.edu/cellcycle/ and http://caulobacter.stanford.edu/CellCycle/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号