首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.  相似文献   

2.
Metagenomics is an emerging field in which the power of genomic analysis is applied to an entire microbial community, bypassing the need to isolate and culture individual microbial species. Assembling of metagenomic DNA fragments is very much like the overlap-layout-consensus procedure for assembling isolated genomes, but is augmented by an additional binning step to differentiate scaffolds, contigs and unassembled reads into various taxonomic groups. In this paper, we employed n-mer oligonucleotide frequencies as the features and developed a hierarchical classifier (PCAHIER) for binning short (≤ 1,000 bps) metagenomic fragments. The principal component analysis was used to reduce the high dimensionality of the feature space. The hierarchical classifier consists of four layers of local classifiers that are implemented based on the linear discriminant analysis. These local classifiers are responsible for binning prokaryotic DNA fragments into superkingdoms, of the same superkingdom into phyla, of the same phylum into genera, and of the same genus into species, respectively. We evaluated the performance of the PCAHIER by using our own simulated data sets as well as the widely used simHC synthetic metagenome data set from the IMG/M system. The effectiveness of the PCAHIER was demonstrated through comparisons against a non-hierarchical classifier, and two existing binning algorithms (TETRA and Phylopythia).  相似文献   

3.
UV-resonance Raman spectroscopy is applied as a method for the identification of lactic acid bacteria from yogurt. Eight different strains of bacteria from Lactobacillus acidophilus, L. delbrueckii ssp. bulgaricus, and Streptococcus thermophilus were investigated. At an excitation wavelength of 244 nm signals from nucleic acids and proteins are selectively enhanced. Classification was accomplished using different chemometric methods. In a first attempt, the unsupervised methods hierarchical cluster analysis and principal component analysis were applied to investigate natural grouping in the data. In a second step the spectra were analyzed using several supervised methods: K-nearest neighbor classifier, nearest mean classifier, linear discriminant analysis, and support vector machines.  相似文献   

4.
5.
6.
A close relationship has been found between the 3D collagen structure and physiological condition of articular cartilage (AC). Studying the 3D collagen network in AC offers a way to determine the condition of the cartilage. However, traditional qualitative studies are time consuming and subjective. This study aims to develop a computer vision-based classifier to automatically determine the condition of AC tissue based on the structural characteristics of the collagen network. Texture analysis was applied to quantitatively characterise the 3D collagen structure in normal (International Cartilage Repair Society, ICRS, grade 0), aged (ICRS grade 1) and osteoarthritic cartilages (ICRS grade 2). Principle component techniques and linear discriminant analysis were then used to classify the microstructural characteristics of the 3D collagen meshwork and the condition of the AC. The 3D collagen meshwork in the three physiological condition groups displayed distinctive characteristics. Texture analysis indicated a significant difference in the mean texture parameters of the 3D collagen network between groups. The principle component and linear discriminant analysis of the texture data allowed for the development of a classifier for identifying the physiological status of the AC with an expected prediction error of 4.23%. An automatic image analysis classifier has been developed to predict the physiological condition of AC (from ICRS grade 0 to 2) based on texture data from the 3D collagen network in the tissue.  相似文献   

7.
Lin W  Wu FX  Shi J  Ding J  Zhang W 《Proteomics》2011,11(19):3773-3778
In our recent work on denoising, a linear combination of five features was used to adjust the peak intensities in tandem mass spectra. Although the method showed a promise, the coefficients (weights) of the linear combination were fixed and determined empirically. In this paper, we proposed an adaptive approach for estimating these weights. The proposed approach: (i) calculates the score for each peak in a data set with the previous empirically determined weights, (ii) selects the training data set based on the scores of peaks, (iii) applies the linear discriminant analysis to the training data set and takes the solution of linear discriminant analysis as the new weights, (iv) calculates the score again with the new weights, (v) repeats (ii)-(iv) until the weights have no significant change. After getting the final weights, the proposed approach follows the previous methods. The proposed approach was applied to two tandem mass spectra data sets: ISB (with low resolution) and TOV-Q (with high resolution) to evaluate its performance. The results show that about 66% of peaks (likely noise peaks) can be removed and that the number of peptides identified by MASCOT increases by 14 and 23.4% for ISB and TOV-Q data set, respectively, compared to the previous work.  相似文献   

8.
基于Hyperion高光谱数据的城市植被胁迫评价   总被引:1,自引:0,他引:1  
快速获取城市植被的胁迫状态,不仅对城市植被健康状况的维护,而且对城市生态环境的改善具有重要意义.在对受胁迫植被的生理特征和光谱特征进行分析的基础上,利用星载高光谱Hyperion数据,计算出与胁迫相关的14种高光谱植被指数,在此基础上运用BP神经网络算法建立了城市植被胁迫强度分类器,对城市植被的胁迫强度进行了识别与分析.结果表明:城市中心商住区的植被受胁迫程度明显高于城乡结合部和郊区;植被的受胁迫现象在大块绿地外围呈环状分布;构建的植被胁迫强度分类器能够较为准确地反映植被受胁迫的强度信息,可为大面积城市植被胁迫监测提供一种较为可靠而快捷的方法.  相似文献   

9.
We present the application of a nonparametric method to performing functional principal component analysis for functional curve data that consist of measurements of a random trajectory for a sample of subjects. This design typically consists of an irregular grid of time points on which repeated measurements are taken for a number of subjects. We introduce shrinkage estimates for the functional principal component scores that serve as the random effects in the model. Scatterplot smoothing methods are used to estimate the mean function and covariance surface of this model. We propose improved estimation in the neighborhood of and at the diagonal of the covariance surface, where the measurement errors are reflected. The presence of additive measurement errors motivates shrinkage estimates for the functional principal component scores. Shrinkage estimates are developed through best linear prediction and in a generalized version, aiming at minimizing one-curve-leave-out prediction error. The estimation of individual trajectories combines data obtained from that individual as well as all other individuals. We apply our methods to new data regarding the analysis of the level of 14C-folate in plasma as a function of time since dosing of healthy adults with a small tracer dose of 14C-folic acid. A time transformation was incorporated to handle design irregularity concerning the time points on which the measurements were taken. The proposed methodology, incorporating shrinkage and data-adaptive features, is seen to be well suited for describing population kinetics of 14C-folate-specific activity and random effects, and can also be applied to other functional data analysis problems.  相似文献   

10.
A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes.  相似文献   

11.
Single-nucleotide polymorphisms (SNPs), believed to determine human differences, are widely used to predict risk of diseases. Typically, clinical samples are limited and/or the sampling cost is high. Thus, it is essential to determine an adequate sample size needed to build a classifier based on SNPs. Such a classifier would facilitate correct classifications, while keeping the sample size to a minimum, thereby making the studies cost-effective. For coded SNP data from 2 classes, an optimal classifier and an approximation to its probability of correct classification (PCC) are derived. A linear classifier is constructed and an approximation to its PCC is also derived. These approximations are validated through a variety of Monte Carlo simulations. A sample size determination algorithm based on the criterion, which ensures that the difference between the 2 approximate PCCs is below a threshold, is given and its effectiveness is illustrated via simulations. For the HapMap data on Chinese and Japanese populations, a linear classifier is built using 51 independent SNPs, and the required total sample sizes are determined using our algorithm, as the threshold varies. For example, when the threshold value is 0.05, our algorithm determines a total sample size of 166 (83 for Chinese and 83 for Japanese) that satisfies the criterion.  相似文献   

12.
MOTIVATION: Ranking gene feature sets is a key issue for both phenotype classification, for instance, tumor classification in a DNA microarray experiment, and prediction in the context of genetic regulatory networks. Two broad methods are available to estimate the error (misclassification rate) of a classifier. Resubstitution fits a single classifier to the data, and applies this classifier in turn to each data observation. Cross-validation (in leave-one-out form) removes each observation in turn, constructs the classifier, and then computes whether this leave-one-out classifier correctly classifies the deleted observation. Resubstitution typically underestimates classifier error, severely so in many cases. Cross-validation has the advantage of producing an effectively unbiased error estimate, but the estimate is highly variable. In many applications it is not the misclassification rate per se that is of interest, but rather the construction of gene sets that have the potential to classify or predict. Hence, one needs to rank feature sets based on their performance. RESULTS: A model-based approach is used to compare the ranking performances of resubstitution and cross-validation for classification based on real-valued feature sets and for prediction in the context of probabilistic Boolean networks (PBNs). For classification, a Gaussian model is considered, along with classification via linear discriminant analysis and the 3-nearest-neighbor classification rule. Prediction is examined in the steady-distribution of a PBN. Three metrics are proposed to compare feature-set ranking based on error estimation with ranking based on the true error, which is known owing to the model-based approach. In all cases, resubstitution is competitive with cross-validation relative to ranking accuracy. This is in addition to the enormous savings in computation time afforded by resubstitution.  相似文献   

13.
目的:探究将统计学习方法应用于心理测验所得的大量数据进行学习分析的可行性,并基于探究结果对飞行职业的人格特征进行进一步探索,为飞行人员的选拔及评估提供新的思路。方法:从某航空公司随机抽取1020名男性被试,其中飞行人员510名,非飞行人员510名,采用卡特尔16项人格测试对其进行测验,施测后对得到的16项因子分采用支持向量机就随机划分的训练组和测试组进行学习,分析学习结果。结果:挑选出4项因子作为分类的特征因子,基于线性支持向量机构建的分类器在交叉验证下的平均正确率为64%。结论:采用SVM构建的分类器具有一定的可靠性和有效性。  相似文献   

14.
MOTIVATION: Subcellular protein localization data are critical to the quantitative understanding of cellular function and regulation. Such data are acquired via observation and quantitative analysis of fluorescently labeled proteins in living cells. Differentiation of labeled protein from cellular artifacts remains an obstacle to accurate quantification. We have developed a novel hybrid machine-learning-based method to differentiate signal from artifact in membrane protein localization data by deriving positional information via surface fitting and combining this with fluorescence-intensity-based data to generate input for a support vector machine. RESULTS: We have employed this classifier to analyze signaling protein localization in T-cell activation. Our classifier displayed increased performance over previously available techniques, exhibiting both flexibility and adaptability: training on heterogeneous data yielded a general classifier with good overall performance; training on more specific data yielded an extremely high-performance specific classifier. We also demonstrate accurate automated learning utilizing additional experimental data.  相似文献   

15.
The availability of age-matched normative data is an essential component of clinical gait analyses. Comparison of normative gait databases is difficult due to the high-dimensionality and temporal nature of the various gait waveforms. The purpose of this study was to provide a method of comparing the sagittal joint angle data between two normative databases. We compared a modern gait database to the historical San Diego database using statistical classifiers developed by Tingley et al. (2002). Gait data were recorded from 60 children aged 1–13 years. A six-camera Vicon 512 motion analysis system and two force plates were utilized to obtain temporal-spatial, kinematic, and kinetic parameters during walking. Differences between the two normative data sets were explored using the classifier index scores, and the mean and covariance structure of the joint angle data from each lab. Significant differences in sagittal angle data between the two databases were identified and attributed to technological advances and data processing techniques (data smoothing, sampling, and joint angle approximations). This work provides a simple method of database comparison using trainable statistical classifiers.  相似文献   

16.
An anthropometric assessment of Huntington's disease patients and families   总被引:2,自引:0,他引:2  
An anthropometric investigation was designed to evaluate patterns of physical deterioration in Huntington's disease (HD). In this study a comprehensive set of measurements was taken including height, weight, body circumferences, skinfold thickness, and craniofacial, linear, and breadth components of the body, on 44 normal, 26 affected, and 70 at-risk individuals between 14 and 88 years of age. The anthropometric data were converted to z-scores using standards to adjust for age and sex differences. These scores were then adjusted for inter-family variation. There were significant differences among normal and affected individuals for all dimensions of body mass, as well as for several craniofacial and linear components of the body. Several significant differences were also found between normals and particular age cohorts of at-risk persons. HD gene carrier status was further assessed by factor analysis of the adjusted scores.  相似文献   

17.
For practical construction of complex synthetic genetic networks able to perform elaborate functions it is important to have a pool of relatively simple modules with different functionality which can be compounded together. To complement engineering of very different existing synthetic genetic devices such as switches, oscillators or logical gates, we propose and develop here a design of synthetic multi-input classifier based on a recently introduced distributed classifier concept. A heterogeneous population of cells acts as a single classifier, whose output is obtained by summarizing the outputs of individual cells. The learning ability is achieved by pruning the population, instead of tuning parameters of an individual cell. The present paper is focused on evaluating two possible schemes of multi-input gene classifier circuits. We demonstrate their suitability for implementing a multi-input distributed classifier capable of separating data which are inseparable for single-input classifiers, and characterize performance of the classifiers by analytical and numerical results. The simpler scheme implements a linear classifier in a single cell and is targeted at separable classification problems with simple class borders. A hard learning strategy is used to train a distributed classifier by removing from the population any cell answering incorrectly to at least one training example. The other scheme implements a circuit with a bell-shaped response in a single cell to allow potentially arbitrary shape of the classification border in the input space of a distributed classifier. Inseparable classification problems are addressed using soft learning strategy, characterized by probabilistic decision to keep or discard a cell at each training iteration. We expect that our classifier design contributes to the development of robust and predictable synthetic biosensors, which have the potential to affect applications in a lot of fields, including that of medicine and industry.  相似文献   

18.
In this technical note, we investigate a combination PCA with SVM to classify gait pattern based on kinetic data. The gait data of 30 young and 30 elderly participants were recorded using a strain gauge force platform during normal walking. The gait features were first extracted from the recorded vertical directional foot- ground reaction forces curve using PCA, and then these extracted features were adopted to develop the SVM gait classifier. The test results indicated that the performance of PCA-based SVM was on average 90% to recognize young- elderly gait patterns, resulting in a markedly improved performance over an artificial neural network-based classifier. The classification ability of the SVM with polynomial and radial basis function kernels was superior to that of the SVM with linear kernel. These results suggest that the proposed technique could provide an effective tool for gait classification in future clinical applications.  相似文献   

19.
The collection of IR spectra through microscope optics and the visualization of the IR data by IR imaging represent a visualization approach, which uses infrared spectral features as a native intrinsic contrast mechanism. To illustrate the potential of this spectroscopic methodology in breast cancer research, we have acquired IR-microspectroscopic data from benign and malignant lesions in breast tissue sections by point microscopy with spot sizes of 30-40 microm. Four classes of distinct breast tissue spectra were defined and stored in the data base: fibroadenoma (a total of 1175 spectra from 14 patients), ductal carcinoma in situ (a total of 1349 spectra from 8 patients), connective tissue (a total of 464 spectra), and adipose tissue (a total of 146 spectra). Artifical neural network analysis, a supervised pattern recognition method, was used to develop an automated classifier to separate the four classes. After training the artifical neural network classifier, infrared spectra of independent external validation data sets ("unknown spectra") were analyzed. In this way, all spectra (a total of 386) taken from micro areas inside the epithelium of fibroadenomas from 4 patients were correctly classified. Out of the 421 spectra taken from micro areas of the in situ component of invasive ductal carcinomas of 3 patients, 93% were correctly identified. Based on these results, the potential of the IR-microspectroscopic approach for diagnosing breast tissue lesions is discussed.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号