共查询到20条相似文献,搜索用时 0 毫秒
1.
Model selection using wavelet decomposition and applications 总被引:1,自引:0,他引:1
2.
3.
Megan L. Smith Megan Ruffley Anahí Espíndola David C. Tank Jack Sullivan Bryan C. Carstens 《Molecular ecology》2017,26(17):4562-4573
Phylogeographic data sets have grown from tens to thousands of loci in recent years, but extant statistical methods do not take full advantage of these large data sets. For example, approximate Bayesian computation (ABC) is a commonly used method for the explicit comparison of alternate demographic histories, but it is limited by the “curse of dimensionality” and issues related to the simulation and summarization of data when applied to next‐generation sequencing (NGS) data sets. We implement here several improvements to overcome these difficulties. We use a Random Forest (RF) classifier for model selection to circumvent the curse of dimensionality and apply a binned representation of the multidimensional site frequency spectrum (mSFS) to address issues related to the simulation and summarization of large SNP data sets. We evaluate the performance of these improvements using simulation and find low overall error rates (~7%). We then apply the approach to data from Haplotrema vancouverense, a land snail endemic to the Pacific Northwest of North America. Fifteen demographic models were compared, and our results support a model of recent dispersal from coastal to inland rainforests. Our results demonstrate that binning is an effective strategy for the construction of a mSFS and imply that the statistical power of RF when applied to demographic model selection is at least comparable to traditional ABC algorithms. Importantly, by combining these strategies, large sets of models with differing numbers of populations can be evaluated. 相似文献
4.
Luk Zoubek Sylvie Charbonnier Suzanne Lesecq Alain Buguet Florian Chapotot 《Biomedical signal processing and control》2007,2(3):171-179
This paper focuses on the problem of selecting relevant features extracted from human polysomnographic (PSG) signals to perform accurate sleep/wake stages classification. Extraction of various features from the electroencephalogram (EEG), the electro-oculogram (EOG) and the electromyogram (EMG) processed in the frequency and time domains was achieved using a database of 47 night sleep recordings obtained from healthy adults in laboratory settings. Multiple iterative feature selection and supervised classification methods were applied together with a systematic statistical assessment of the classification performances. Our results show that using a simple set of features such as relative EEG powers in five frequency bands yields an agreement of 71% with the whole database classification of two human experts. These performances are within the range of existing classification systems. The addition of features extracted from the EOG and EMG signals makes it possible to reach about 80% of agreement with the expert classification. The most significant improvement on classification accuracy is obtained on NREM sleep stage I, a stage of transition between sleep and wakefulness. 相似文献
5.
6.
7.
Background
Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. 相似文献8.
Feature selection for splice site prediction: A new method using EDA-based feature ranking 总被引:1,自引:0,他引:1
Background
The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. 相似文献9.
10.
David R. Labbe Jacques A de Guise Neila Mezghani Véronique Godbout Guy Grimard David Baillargeon Patrick Lavigne Julio Fernandes Pierre Ranger Nicola Hagemeister 《Journal of biomechanics》2010,43(16):3080-3084
The pivot shift test reproduces a complex instability of the knee joint following rupture of the anterior cruciate ligament. The grade of the pivot shift test has been shown to correlate to subjective criteria of knee joint function, return to physical activity and long-term outcome. This severity is represented by a grade that is attributed by a clinician in a subjective manner, rendering the pivot shift test poorly reliable.The purpose of this study was to unveil the kinematic parameters that are evaluated by clinicians when they establish a pivot shift grade. To do so, eight orthopaedic surgeons performed a total of 127 pivot shift examinations on 70 subjects presenting various degrees of knee joint instability. The knee joint kinematics were recorded using electromagnetic sensors and principal component analysis was used to determine which features explain most of the variability between recordings. Four principal components were found to account for most of this variability (69%), with only the first showing a correlation to the pivot shift grade (r=0.55). Acceleration and velocity of tibial translation were found to be the features that best correlate to the first principal component, meaning they are the most useful for distinguishing different recordings. The magnitudes of the tibial translation and rotation were amongst those that accounted for the least variability. These results indicate that future efforts to quantify the pivot shift should focus more on the velocity and acceleration of tibial translation and less on the traditionally accepted parameters that are the magnitudes of posterior translation and external tibial rotation. 相似文献
11.
Directional selection and the site-frequency spectrum. 总被引:4,自引:0,他引:4
In this article we explore statistical properties of the maximum-likelihood estimates (MLEs) of the selection and mutation parameters in a Poisson random field population genetics model of directional selection at DNA sites. We derive the asymptotic variances and covariance of the MLEs and explore the power of the likelihood ratio tests (LRT) of neutrality for varying levels of mutation and selection as well as the robustness of the LRT to deviations from the assumption of free recombination among sites. We also discuss the coverage of confidence intervals on the basis of two standard-likelihood methods. We find that the LRT has high power to detect deviations from neutrality and that the maximum-likelihood estimation performs very well when the ancestral states of all mutations in the sample are known. When the ancestral states are not known, the test has high power to detect deviations from neutrality for negative selection but not for positive selection. We also find that the LRT is not robust to deviations from the assumption of independence among sites. 相似文献
12.
Molecular portraits, such as mRNA expression or DNA methylation patterns, have been shown to be strongly correlated with phenotypical parameters. These molecular patterns can be revealed routinely on a genomic scale. However, class prediction based on these patterns is an under-determined problem, due to the extreme high dimensionality of the data compared to the usually small number of available samples. This makes a reduction of the data dimensionality necessary. Here we demonstrate how phenotypic classes can be predicted by combining feature selection and discriminant analysis. By comparing several feature selection methods we show that the right dimension reduction strategy is of crucial importance for the classification performance. The techniques are demonstrated by methylation pattern based discrimination between acute lymphoblastic leukemia and acute myeloid leukemia. 相似文献
13.
Moustakidis SP Theocharis JB Giakas G 《Computer methods in biomechanics and biomedical engineering》2012,15(6):627-644
An efficient wavelet-based feature selection (FS) method is proposed in this paper for subject recognition using ground reaction force measurements. Our approach relies on a local fuzzy evaluation measure with respect to patterns that reveal the adequacy of data coverage for each feature. Furthermore, FS is driven by a fuzzy complementary criterion (FuzCoC) which assures that those features are iteratively introduced, providing the maximum additional contribution with regard to the information content given by the previously selected features. On the basis of the principles of FuzCoC, we develop two novel techniques. At Stage 1, wavelet packet (WP) decomposition of gaits is accomplished to obtain a set of discriminating frequency sub-bands. A computationally simple FS method is then applied at Stage 2, providing a compact set of powerful and complementary features, from WP coefficients. The quality of our approach is validated via comparative analysis against existing methods on gait recognition. 相似文献
14.
Predicting allergenic proteins using wavelet transform 总被引:2,自引:0,他引:2
MOTIVATION: With many transgenic proteins introduced today, the ability to predict their potential allergenicity has become an important issue. Previous studies were based on either sequence similarity or the protein motifs identified from known allergen databases. The similarity-based approaches, although being able to produce high recalls, usually have low prediction precisions. Previous motif-based approaches have been shown to be able to improve the precisions on cross-validation experiments. In this study, a system that combines the advantages of similarity-based and motif-based prediction is described. RESULTS: The new prediction system uses a clustering algorithm that groups the known allergenic proteins into clusters. Proteins within each cluster are assumed to carry one or more common motifs. After a multiple sequence alignment, proteins in each cluster go through a wavelet analysis program whereby conserved motifs will be identified. A hidden Markov model (HMM) profile will then be prepared for each identified motif. The allergens that do not appear to carry detectable allergen motifs will be saved in a small database. The allergenicity of an unknown protein may be predicted by comparing it against the HMM profiles, and, if no matching profiles are found, against the small allergen database by BLASTP. Over 70% of recall and over 90% of precision were observed using cross-validation experiments. Using the entire Swiss-Prot as the query, we predicted about 2000 potential allergens. AVAILABILITY: The software is available upon request from the authors. 相似文献
15.
The wavelet analysis is a powerful tool for analyzing and detecting features of signals characterized by time-dependent statistical properties, as biomedical signals. The identification and the analysis of the components of these signals in the time–frequency domain, give meaningful information about the physiological mechanisms that govern them. This article presents the results of the wavelet analysis applied to the a-wave component of the human electroretinogram. In order to deepen and improve our knowledge about the behavior of the early photoreceptoral response, including the possible activation of interactions and correlations among the photoreceptors, we have detected and identified the stable time–frequency components of the a-wave, using six representative values of luminance. The results indicate the occurrence of three frequencies lying in the range 20–200 Hz. The lowest one is attributed to the summed activities of the photoreceptors. The others are weaker and at low luminance one of them does not occur. We relate them to the response of the rods and the cones whose aggregate activities are non-linear and typically exhibit self-organization under selective stimuli. The identification of the stable frequency components and of their times of occurrence helps us to shine light about the complex mechanisms governing the a-wave. The present results are promising toward the assessment of more refined model concerning the photoreceptoral activities. 相似文献
16.
Fang J Dong Y Williams TD Lushington GH 《Journal of bioinformatics and computational biology》2008,6(1):223-240
Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs. 相似文献
17.
Translation initiation sites (TISs) are important signals in cDNA sequences. In many previous attempts to predict TISs in cDNA sequences, three major factors affect the prediction performance: the nature of the cDNA sequence sets, the relevant features selected. and the classification methods used. In this paper, we examine different approaches to select and integrate relevant features for TIS prediction. The top selected significant features include the features from the position weight matrix and the propensity matrix, the number of nucleotide C in the sequence downstream ATG, the number of downstream stop codons. the number of upstream ATGs, and the number of some amino acids, such as amino acids A and D. With the numerical data generated from these features, different classification methods, including decision tree. naive Bayes, and support vector machine, were applied to three independent sequence sets. The identified significant features were found to be biologically meaningful. while the experiments showed promising results. 相似文献
18.
高维蛋白质波谱癌症数据分析,一直面临着高维数据的困扰。针对高维蛋白质波谱癌症数据在降维过程中的问题,提出基于小波分析技术和主成分分析技术的高维蛋白质波谱癌症数据特征提取的方法,并在特征提取之后,使用支持向量机进行分类。对8-7-02数据集进行2层小波分解时,分别使用db1、db3、db4、db6、db8、db10、haar小波基,并使用支持向量机进行分类,正确率分别达到98.18%、98.35%、98.04%、98.36%、97.89%、97.96%、98.20%。在进一步提高分类识别正确率的同时,提高了时间率。 相似文献
19.
Background
The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. 相似文献20.
The purpose of this retrospective study was to investigate some parameters of neuromuscular performance of the lower limbs in a population cross-section and their relationship to the risk of falls, using a force platform (FP). Individuals from the Lower Franconia population were invited by public advertisement. Out of a total of 1720 invited subjects 50-90 years of age, the successful completion of all tests were achieved by 807 women, age 66.4±9.3, and 442 men, age 64.0±9.2. A novel FP measured the time series of vertical forces over 10 s during 3 kinds of tests: tandem stand with eyes closed, knee bends, and chair rise. Proprietary software captured the peak force and calculated the power density distribution (PSD), intended to characterize balance and power through the FP. Grip strength as a common geriatric force test was dynamometrically measured for comparison. The parameters were related to the number of falls in the past 12 months in both genders. Mean PSD showed little age dependency and was not related to falls in tandem stance. Peak forces and power over 10 s knee bends showed a larger age-related decrease in men than in women and these parameters were related to falls (p<0.001), whereas they were not related to falls in the chair rise test. Chair rise time and grip strength was related to falls in women (p<0.01). The PSD obtained from the tandem test with eyes closed did not provide a sensitive parameter associated with falls. Knee bends may be a meaningful FP screening test that justifies further studies of physical performance related to the risk of falls, whereas chair rise and grip measurements provided inferior information in this study. 相似文献