首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOBCENTR is a classification algorithm combining features of classification about mobile centres, Ward algorithm and of the Hard-Isodata method. The results of this new algorithm and of Ward algorithm are compared by morphological characters of species ofSolidago andPimpinella.  相似文献   

2.
Summary This paper proposes a modified radial basis function classification algorithm for non-linear cancer classification. In the algorithm, a modified simulated annealing method is developed and combined with the linear least square and gradient paradigms to optimize the structure of the radial basis function (RBF) classifier. The proposed algorithm can be adopted to perform non-linear cancer classification based on gene expression profiles and applied to two microarray data sets involving various human tumor classes: (1) Normal versus colon tumor; (2) acute myeloid leukemia (AML) versus acute lymphoblastic leukemia (ALL). Finally, accuracy and stability for the proposed algorithm are further demonstrated by comparing with the other cancer classification algorithms.  相似文献   

3.
Particle classification is an important component of multivariate statistical analysis methods that has been used extensively to extract information from electron micrographs of single particles. Here we describe a new Bayesian Gibbs sampling algorithm for the classification of such images. This algorithm, which is applied after dimension reduction by correspondence analysis or by principal components analysis, dynamically learns the parameters of the multivariate Gaussian distributions that characterize each class. These distributions describe tilted ellipsoidal clusters that adaptively adjust shape to capture differences in the variances of factors and the correlations of factors within classes. A novel Bayesian procedure to objectively select factors for inclusion in the classification models is a component of this procedure. A comparison of this algorithm with hierarchical ascendant classification of simulated data sets shows improved classification over a broad range of signal-to-noise ratios.  相似文献   

4.
Fast and robust classification of feature vectors is a crucial task in a number of real-time systems. A cellular neural/nonlinear network universal machine (CNN-UM) can be very efficient as a feature detector. The next step is to post-process the results for object recognition. This paper shows how a robust classification scheme based on adaptive resonance theory (ART) can be mapped to the CNN-UM. Moreover, this mapping is general enough to include different types of feed-forward neural networks. The designed analogic CNN algorithm is capable of classifying the extracted feature vectors keeping the advantages of the ART networks, such as robust, plastic and fault-tolerant behaviors. An analogic algorithm is presented for unsupervised classification with tunable sensitivity and automatic new class creation. The algorithm is extended for supervised classification. The presented binary feature vector classification is implemented on the existing standard CNN-UM chips for fast classification. The experimental evaluation shows promising performance after 100% accuracy on the training set.  相似文献   

5.
Phyloproteomics is a novel analytical tool that solves the issue of comparability between proteomic analyses, utilizes a total spectrum-parsing algorithm, and produces biologically meaningful classification of specimens. Phyloproteomics employs two algorithms: a new parsing algorithm (UNIPAL) and a phylogenetic algorithm (MIX). By outgroup comparison, the parsing algorithm identifies novel or vanished MS peaks and peaks signifying up or down regulated proteins and scores them as derived or ancestral. The phylogenetic algorithm uses the latter scores to produce a biologically meaningful classification of the specimens.  相似文献   

6.
A genotype calling algorithm for affymetrix SNP arrays   总被引:11,自引:0,他引:11  
MOTIVATION: A classification algorithm, based on a multi-chip, multi-SNP approach is proposed for Affymetrix SNP arrays. Current procedures for calling genotypes on SNP arrays process all the features associated with one chip and one SNP at a time. Using a large training sample where the genotype labels are known, we develop a supervised learning algorithm to obtain more accurate classification results on new data. The method we propose, RLMM, is based on a robustly fitted, linear model and uses the Mahalanobis distance for classification. The chip-to-chip non-biological variance is reduced through normalization. This model-based algorithm captures the similarities across genotype groups and probes, as well as across thousands of SNPs for accurate classification. In this paper, we apply RLMM to Affymetrix 100 K SNP array data, present classification results and compare them with genotype calls obtained from the Affymetrix procedure DM, as well as to the publicly available genotype calls from the HapMap project.  相似文献   

7.
《IRBM》2020,41(4):229-239
Feature selection algorithms are the cornerstone of machine learning. By increasing the properties of the samples and samples, the feature selection algorithm selects the significant features. The general name of the methods that perform this function is the feature selection algorithm. The general purpose of feature selection algorithms is to select the most relevant properties of data classes and to increase the classification performance. Thus, we can select features based on their classification performance. In this study, we have developed a feature selection algorithm based on decision support vectors classification performance. The method can work according to two different selection criteria. We tested the classification performances of the features selected with P-Score with three different classifiers. Besides, we assessed P-Score performance with 13 feature selection algorithms in the literature. According to the results of the study, the P-Score feature selection algorithm has been determined as a method which can be used in the field of machine learning.  相似文献   

8.
MOTIVATION: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. RESULTS: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set. AVAILABILITY: Available on request from the authors.  相似文献   

9.
基于SVM和平均影响值的人肿瘤信息基因提取   总被引:1,自引:0,他引:1       下载免费PDF全文
基于基因表达谱的肿瘤分类信息基因选取是发现肿瘤特异表达基因、探索肿瘤基因表达模式的重要手段。借助由基因表达谱获得的分类信息进行肿瘤诊断是当今生物信息学领域中的一个重要研究方向,有望成为临床医学上一种快速而有效的肿瘤分子诊断方法。鉴于肿瘤基因表达谱样本数据维数高、样本量小以及噪音大等特点,提出一种结合支持向量机应用平均影响值来寻找肿瘤信息基因的算法,其优点是能够搜索到基因数量尽可能少而分类能力尽可能强的多个信息基因子集。采用二分类肿瘤数据集验证算法的可行性和有效性,对于结肠癌样本集,只需3个基因就能获得100%的留一法交叉验证识别准确率。为避免样本集的不同划分对分类性能的影响,进一步采用全折交叉验证方法来评估各信息基因子集的分类性能,优选出更可靠的信息基因子集。与基它肿瘤分类方法相比,实验结果在信息基因数量以及分类性能方面具有明显的优势。  相似文献   

10.
We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.  相似文献   

11.
癌症基因表达谱挖掘中的特征基因选择算法GA/WV   总被引:1,自引:0,他引:1  
鉴定癌症表达谱的特征基因集合可以促进癌症类型分类的研究,这也可能使病人获得更好的临床诊断?虽然一些方法在基因表达谱分析上取得了成功,但是用基因表达谱数据进行癌症分类研究依然是一个巨大的挑战,其主要原因在于缺少通用而可靠的基因重要性评估方法。GA/WV是一种新的用复杂的生物表达数据评估基因分类重要性的方法,通过联合遗传算法(GA)和加权投票分类算法(WV)得到的特征基因集合不但适用于WV分类器,也适用于其它分类器?将GA/WV方法用癌症基因表达谱数据集的验证,结果表明本方法是一种成功可靠的特征基因选择方法。  相似文献   

12.
癌症的早期诊断能够显著提高癌症患者的存活率,在肝细胞癌患者中这种情况更加明显。机器学习是癌症分类中的有效工具。如何在复杂和高维的癌症数据集中,选择出低维度、高分类精度的特征子集是癌症分类的难题。本文提出了一种二阶段的特征选择方法SC-BPSO:通过组合Spearman相关系数和卡方独立检验作为过滤器的评价函数,设计了一种新型的过滤器方法——SC过滤器,再组合SC过滤器方法和基于二进制粒子群算法(BPSO)的包裹器方法,从而实现两阶段的特征选择。并应用在高维数据的癌症分类问题中,区分正常样本和肝细胞癌样本。首先,对来自美国国家生物信息中心(NCBI)和欧洲生物信息研究所(EBI)的130个肝组织microRNA序列数据(64肝细胞癌,66正常肝组织)进行预处理,使用MiRME算法从原始序列文件中提取microRNA的表达量、编辑水平和编辑后表达量3类特征。然后,调整SC-BPSO算法在肝细胞癌分类场景中的参数,选择出关键特征子集。最后,建立分类模型,预测结果,并与信息增益过滤器、信息增益率过滤器、BPSO包裹器特征选择算法选出的特征子集,使用相同参数的随机森林、支持向量机、决策树、KNN四种分类器分类,对比分类结果。使用SC-BPSO算法选择出的特征子集,分类准确率高达98.4%。研究结果表明,与另外3个特征选择算法相比,SC-BPSO算法能有效地找到尺寸较小和精度更高的特征子集。这对于少量样本高维数据的癌症分类问题可能具有重要意义。  相似文献   

13.
Models of biological diffusion-reaction systems require accurate classification of the underlying diffusive dynamics (e.g., Fickian, subdiffusive, or superdiffusive). We use a renormalization group operator to identify the anomalous (non-Fickian) diffusion behavior from a short trajectory of a single molecule. The method provides quantitative information about the underlying stochastic process, including its anomalous scaling exponent. The classification algorithm is first validated on simulated trajectories of known scaling. Then it is applied to experimental trajectories of microspheres diffusing in cytoplasm, revealing heterogeneous diffusive dynamics. The simplicity and robustness of this classification algorithm makes it an effective tool for analysis of rare stochastic events that occur in complex biological systems.  相似文献   

14.
苹果的粉质化是指苹果果肉发软、汁液减少等一系列物理和生理变化现象,采用高光谱散射图像技术结合信号稀疏表示分类算法(SRSA)研究了苹果的粉质化分类问题。首先利用平均反射算法(MEAN)提取了600~1000 nm的高光谱散射图像特征;引入遗传算法(GA)解决分类样本的不均衡问题,在此基础上,把苹果的粉质化分类问题,转化为一个求解待识别样本对于整体训练样本的稀疏表示问题。仿真结果表明,基于信号稀疏表示分类算法的苹果粉质化分类精度为79.8%,高于偏最小二乘判别分析(PLSDA)的74.8%,为苹果的粉质化分类提供了一种新的有效的方法。  相似文献   

15.
Robust feature selection for microarray data based on multicriterion fusion   总被引:1,自引:0,他引:1  
Feature selection often aims to select a compact feature subset to build a pattern classifier with reduced complexity, so as to achieve improved classification performance. From the perspective of pattern analysis, producing stable or robust solution is also a desired property of a feature selection algorithm. However, the issue of robustness is often overlooked in feature selection. In this study, we analyze the robustness issue existing in feature selection for high-dimensional and small-sized gene-expression data, and propose to improve robustness of feature selection algorithm by using multiple feature selection evaluation criteria. Based on this idea, a multicriterion fusion-based recursive feature elimination (MCF-RFE) algorithm is developed with the goal of improving both classification performance and stability of feature selection results. Experimental studies on five gene-expression data sets show that the MCF-RFE algorithm outperforms the commonly used benchmark feature selection algorithm SVM-RFE.  相似文献   

16.
We propose a stochastic learning algorithm for multilayer perceptrons of linear-threshold function units, which theoretically converges with probability one and experimentally exhibits 100% convergence rate and remarkable speed on parity and classification problems with typical generalization accuracy. For learning the n bit parity function with n hidden units, the algorithm converged on all the trials we tested (n=2 to 12) after 5.8 x 4.1(n) presentations for 0.23 x 4.0(n-6) seconds on a 533MHz Alpha 21164A chip on average, which is five to ten times faster than Levenberg-Marquardt algorithm with restarts. For a medium size classification problem known as Thyroid in UCI repository, the algorithm is faster in speed and comparative in generalization accuracy than the standard backpropagation and Levenberg-Marquardt algorithms.  相似文献   

17.
In the medical domain, it is very significant to develop a rule-based classification model. This is because it has the ability to produce a comprehensible and understandable model that accounts for the predictions. Moreover, it is desirable to know not only the classification decisions but also what leads to these decisions. In this paper, we propose a novel dynamic quantitative rule-based classification model, namely DQB, which integrates quantitative association rule mining and the Artificial Bee Colony (ABC) algorithm to provide users with more convenience in terms of understandability and interpretability via an accurate class quantitative association rule-based classifier model. As far as we know, this is the first attempt to apply the ABC algorithm in mining for quantitative rule-based classifier models. In addition, this is the first attempt to use quantitative rule-based classification models for classifying microarray gene expression profiles. Also, in this research we developed a new dynamic local search strategy named DLS, which is improved the local search for artificial bee colony (ABC) algorithm. The performance of the proposed model has been compared with well-known quantitative-based classification methods and bio-inspired meta-heuristic classification algorithms, using six gene expression profiles for binary and multi-class cancer datasets. From the results, it can be concludes that a considerable increase in classification accuracy is obtained for the DQB when compared to other available algorithms in the literature, and it is able to provide an interpretable model for biologists. This confirms the significance of the proposed algorithm in the constructing a classifier rule-based model, and accordingly proofs that these rules obtain a highly qualified and meaningful knowledge extracted from the training set, where all subset of quantitive rules report close to 100% classification accuracy with a minimum number of genes. It is remarkable that apparently (to the best of our knowledge) several new genes were discovered that have not been seen in any past studies. For the applicability demand, based on the results acqured from microarray gene expression analysis, we can conclude that DQB can be adopted in a different real world applications with some modifications.  相似文献   

18.
The aim of this study was to present a new training algorithm using artificial neural networks called multi-objective least absolute shrinkage and selection operator (MOBJ-LASSO) applied to the classification of dynamic gait patterns. The movement pattern is identified by 20 characteristics from the three components of the ground reaction force which are used as input information for the neural networks in gender-specific gait classification. The classification performance between MOBJ-LASSO (97.4%) and multi-objective algorithm (MOBJ) (97.1%) is similar, but the MOBJ-LASSO algorithm achieved more improved results than the MOBJ because it is able to eliminate the inputs and automatically select the parameters of the neural network. Thus, it is an effective tool for data mining using neural networks. From 20 inputs used for training, MOBJ-LASSO selected the first and second peaks of the vertical force and the force peak in the antero-posterior direction as the variables that classify the gait patterns of the different genders.  相似文献   

19.
基于小波和神经网络的动态心电波形分类新方法   总被引:1,自引:0,他引:1  
利用小波分析提取动态心电波形(DECG)的概貌信息。然后用所得概貌信息作为神经网络的输入,对DECG进行分类。这样一方面可以使神经网络的输入点大大减少,提高了神经网络的分类速度;另一方面也可以看作是对DECG数据的压缩,使数据量大为减少,而其基本的形态特征基本上没有损失,同时还在一定程度上降低了噪声的影响。用MIT数据库中的数据作实验表明所提出的方法简单、易行,分类速度和分类精度都比原有方法提高。  相似文献   

20.
Artificial immune recognition system (AIRS) classification algorithm, which has an important place among classification algorithms in the field of artificial immune systems, has showed an effective and intriguing performance on the problems it was applied. AIRS was previously applied to some medical classification problems including breast cancer, Cleveland heart disease, diabetes and it obtained very satisfactory results. So, AIRS proved to be an efficient artificial intelligence technique in medical field. In this study, the resource allocation mechanism of AIRS was changed with a new one determined by fuzzy-logic. This system, named as fuzzy-AIRS was used as a classifier in the diagnosis of lymph diseases, which is of great importance in medicine. The classifications of lymph diseases dataset taken from University of California at Irvine (UCI) Machine Learning Repository were done using 10-fold cross-validation method. Reached classification accuracies were evaluated by comparing them with reported classifiers in UCI web site in addition to other systems that are applied to the related problems. Also, the obtained classification performances were compared with AIRS with regard to the classification accuracy, number of resources and classification time. While only AIRS algorithm obtained 83.138% classification accuracy, fuzzy-AIRS classified the lymph diseases dataset with 90.00% accuracy. For lymph diseases dataset, fuzzy-AIRS obtained the highest classification accuracy according to the UCI web site. Beside of this success, fuzzy-AIRS gained an important advantage over the AIRS by means of classification time. By reducing classification time as well as obtaining high classification accuracies in the applied datasets, fuzzy-AIRS classifier proved that it could be used as an effective classifier for medical problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号