首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Model selection using wavelet decomposition and applications   总被引:1,自引:0,他引:1  
  相似文献   

2.
This paper focuses on the problem of selecting relevant features extracted from human polysomnographic (PSG) signals to perform accurate sleep/wake stages classification. Extraction of various features from the electroencephalogram (EEG), the electro-oculogram (EOG) and the electromyogram (EMG) processed in the frequency and time domains was achieved using a database of 47 night sleep recordings obtained from healthy adults in laboratory settings. Multiple iterative feature selection and supervised classification methods were applied together with a systematic statistical assessment of the classification performances. Our results show that using a simple set of features such as relative EEG powers in five frequency bands yields an agreement of 71% with the whole database classification of two human experts. These performances are within the range of existing classification systems. The addition of features extracted from the EOG and EMG signals makes it possible to reach about 80% of agreement with the expert classification. The most significant improvement on classification accuracy is obtained on NREM sleep stage I, a stage of transition between sleep and wakefulness.  相似文献   

3.
4.
5.

Background  

Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application.  相似文献   

6.

Background  

The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.  相似文献   

7.
8.
The pivot shift test reproduces a complex instability of the knee joint following rupture of the anterior cruciate ligament. The grade of the pivot shift test has been shown to correlate to subjective criteria of knee joint function, return to physical activity and long-term outcome. This severity is represented by a grade that is attributed by a clinician in a subjective manner, rendering the pivot shift test poorly reliable.The purpose of this study was to unveil the kinematic parameters that are evaluated by clinicians when they establish a pivot shift grade. To do so, eight orthopaedic surgeons performed a total of 127 pivot shift examinations on 70 subjects presenting various degrees of knee joint instability. The knee joint kinematics were recorded using electromagnetic sensors and principal component analysis was used to determine which features explain most of the variability between recordings. Four principal components were found to account for most of this variability (69%), with only the first showing a correlation to the pivot shift grade (r=0.55). Acceleration and velocity of tibial translation were found to be the features that best correlate to the first principal component, meaning they are the most useful for distinguishing different recordings. The magnitudes of the tibial translation and rotation were amongst those that accounted for the least variability. These results indicate that future efforts to quantify the pivot shift should focus more on the velocity and acceleration of tibial translation and less on the traditionally accepted parameters that are the magnitudes of posterior translation and external tibial rotation.  相似文献   

9.
Molecular portraits, such as mRNA expression or DNA methylation patterns, have been shown to be strongly correlated with phenotypical parameters. These molecular patterns can be revealed routinely on a genomic scale. However, class prediction based on these patterns is an under-determined problem, due to the extreme high dimensionality of the data compared to the usually small number of available samples. This makes a reduction of the data dimensionality necessary. Here we demonstrate how phenotypic classes can be predicted by combining feature selection and discriminant analysis. By comparing several feature selection methods we show that the right dimension reduction strategy is of crucial importance for the classification performance. The techniques are demonstrated by methylation pattern based discrimination between acute lymphoblastic leukemia and acute myeloid leukemia.  相似文献   

10.
Directional selection and the site-frequency spectrum.   总被引:4,自引:0,他引:4  
C D Bustamante  J Wakeley  S Sawyer  D L Hartl 《Genetics》2001,159(4):1779-1788
In this article we explore statistical properties of the maximum-likelihood estimates (MLEs) of the selection and mutation parameters in a Poisson random field population genetics model of directional selection at DNA sites. We derive the asymptotic variances and covariance of the MLEs and explore the power of the likelihood ratio tests (LRT) of neutrality for varying levels of mutation and selection as well as the robustness of the LRT to deviations from the assumption of free recombination among sites. We also discuss the coverage of confidence intervals on the basis of two standard-likelihood methods. We find that the LRT has high power to detect deviations from neutrality and that the maximum-likelihood estimation performs very well when the ancestral states of all mutations in the sample are known. When the ancestral states are not known, the test has high power to detect deviations from neutrality for negative selection but not for positive selection. We also find that the LRT is not robust to deviations from the assumption of independence among sites.  相似文献   

11.
An efficient wavelet-based feature selection (FS) method is proposed in this paper for subject recognition using ground reaction force measurements. Our approach relies on a local fuzzy evaluation measure with respect to patterns that reveal the adequacy of data coverage for each feature. Furthermore, FS is driven by a fuzzy complementary criterion (FuzCoC) which assures that those features are iteratively introduced, providing the maximum additional contribution with regard to the information content given by the previously selected features. On the basis of the principles of FuzCoC, we develop two novel techniques. At Stage 1, wavelet packet (WP) decomposition of gaits is accomplished to obtain a set of discriminating frequency sub-bands. A computationally simple FS method is then applied at Stage 2, providing a compact set of powerful and complementary features, from WP coefficients. The quality of our approach is validated via comparative analysis against existing methods on gait recognition.  相似文献   

12.
The wavelet analysis is a powerful tool for analyzing and detecting features of signals characterized by time-dependent statistical properties, as biomedical signals. The identification and the analysis of the components of these signals in the time–frequency domain, give meaningful information about the physiological mechanisms that govern them. This article presents the results of the wavelet analysis applied to the a-wave component of the human electroretinogram. In order to deepen and improve our knowledge about the behavior of the early photoreceptoral response, including the possible activation of interactions and correlations among the photoreceptors, we have detected and identified the stable time–frequency components of the a-wave, using six representative values of luminance. The results indicate the occurrence of three frequencies lying in the range 20–200 Hz. The lowest one is attributed to the summed activities of the photoreceptors. The others are weaker and at low luminance one of them does not occur. We relate them to the response of the rods and the cones whose aggregate activities are non-linear and typically exhibit self-organization under selective stimuli. The identification of the stable frequency components and of their times of occurrence helps us to shine light about the complex mechanisms governing the a-wave. The present results are promising toward the assessment of more refined model concerning the photoreceptoral activities.  相似文献   

13.
Predicting allergenic proteins using wavelet transform   总被引:2,自引:0,他引:2  
MOTIVATION: With many transgenic proteins introduced today, the ability to predict their potential allergenicity has become an important issue. Previous studies were based on either sequence similarity or the protein motifs identified from known allergen databases. The similarity-based approaches, although being able to produce high recalls, usually have low prediction precisions. Previous motif-based approaches have been shown to be able to improve the precisions on cross-validation experiments. In this study, a system that combines the advantages of similarity-based and motif-based prediction is described. RESULTS: The new prediction system uses a clustering algorithm that groups the known allergenic proteins into clusters. Proteins within each cluster are assumed to carry one or more common motifs. After a multiple sequence alignment, proteins in each cluster go through a wavelet analysis program whereby conserved motifs will be identified. A hidden Markov model (HMM) profile will then be prepared for each identified motif. The allergens that do not appear to carry detectable allergen motifs will be saved in a small database. The allergenicity of an unknown protein may be predicted by comparing it against the HMM profiles, and, if no matching profiles are found, against the small allergen database by BLASTP. Over 70% of recall and over 90% of precision were observed using cross-validation experiments. Using the entire Swiss-Prot as the query, we predicted about 2000 potential allergens. AVAILABILITY: The software is available upon request from the authors.  相似文献   

14.
Feature selection for the prediction of translation initiation sites   总被引:3,自引:0,他引:3  
Translation initiation sites (TISs) are important signals in cDNA sequences. In many previous attempts to predict TISs in cDNA sequences, three major factors affect the prediction performance: the nature of the cDNA sequence sets, the relevant features selected. and the classification methods used. In this paper, we examine different approaches to select and integrate relevant features for TIS prediction. The top selected significant features include the features from the position weight matrix and the propensity matrix, the number of nucleotide C in the sequence downstream ATG, the number of downstream stop codons. the number of upstream ATGs, and the number of some amino acids, such as amino acids A and D. With the numerical data generated from these features, different classification methods, including decision tree. naive Bayes, and support vector machine, were applied to three independent sequence sets. The identified significant features were found to be biologically meaningful. while the experiments showed promising results.  相似文献   

15.
Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs.  相似文献   

16.

Background  

The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry.  相似文献   

17.
The purpose of this retrospective study was to investigate some parameters of neuromuscular performance of the lower limbs in a population cross-section and their relationship to the risk of falls, using a force platform (FP). Individuals from the Lower Franconia population were invited by public advertisement. Out of a total of 1720 invited subjects 50-90 years of age, the successful completion of all tests were achieved by 807 women, age 66.4±9.3, and 442 men, age 64.0±9.2. A novel FP measured the time series of vertical forces over 10 s during 3 kinds of tests: tandem stand with eyes closed, knee bends, and chair rise. Proprietary software captured the peak force and calculated the power density distribution (PSD), intended to characterize balance and power through the FP. Grip strength as a common geriatric force test was dynamometrically measured for comparison. The parameters were related to the number of falls in the past 12 months in both genders. Mean PSD showed little age dependency and was not related to falls in tandem stance. Peak forces and power over 10 s knee bends showed a larger age-related decrease in men than in women and these parameters were related to falls (p<0.001), whereas they were not related to falls in the chair rise test. Chair rise time and grip strength was related to falls in women (p<0.01). The PSD obtained from the tandem test with eyes closed did not provide a sensitive parameter associated with falls. Knee bends may be a meaningful FP screening test that justifies further studies of physical performance related to the risk of falls, whereas chair rise and grip measurements provided inferior information in this study.  相似文献   

18.
This study presents pattern recognition experiments of the electroencephalogram. The components of the feature vector are built up by Parcor coefficients which provide a simple structure of the covariance matrix. The BAYES classifier is implemented which is theoretically best in minimizing the error rate. The MAHALANOBIS classifier is used too by means of an averaged covariance matrix. The performance of the classifier is tested in experiment by computing the error rate.  相似文献   

19.
MOTIVATION: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.  相似文献   

20.
Desai MM  Plotkin JB 《Genetics》2008,180(4):2175-2191
The distribution of genetic polymorphisms in a population contains information about evolutionary processes. The Poisson random field (PRF) model uses the polymorphism frequency spectrum to infer the mutation rate and the strength of directional selection. The PRF model relies on an infinite-sites approximation that is reasonable for most eukaryotic populations, but that becomes problematic when is large ( greater, similar 0.05). Here, we show that at large mutation rates characteristic of microbes and viruses the infinite-sites approximation of the PRF model induces systematic biases that lead it to underestimate negative selection pressures and mutation rates and erroneously infer positive selection. We introduce two new methods that extend our ability to infer selection pressures and mutation rates at large : a finite-site modification of the PRF model and a new technique based on diffusion theory. Our methods can be used to infer not only a "weighted average" of selection pressures acting on a gene sequence, but also the distribution of selection pressures across sites. We evaluate the accuracy of our methods, as well that of the original PRF approach, by comparison with Wright-Fisher simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号