首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
实蝇科果实蝇属昆虫数字图像自动识别系统的构建和测试   总被引:2,自引:0,他引:2  
针对双翅目实蝇科果实蝇属昆虫的自动识别,本文提出利用翅及中胸背板图像的局部二进制模式(local binary pattern, LBP)特征,采用Adaboost算法, 设计和开发“实蝇科果实蝇属昆虫数字图像自动识别系统”(Automated Fruit fly Identification System-Bactrocera, AFIS-B)。该系统包括图像采集、图像裁剪、预处理、特征提取、分类器设计、识别和显示,共7个模块。研究结果表明: LBP特征可以有效鉴别实蝇科果实蝇属昆虫;在对实蝇科果实蝇属8个种的测试中, 该系统表现出较高的准确性和稳定性,平均识别率可达80%以上。此外,还对果实蝇属昆虫翅膀及中胸背板图像在光照不均匀、姿态扭曲、样本受损及样本量大小等不同条件下的识别率进行了试验测试。结果表明, 该系统对测试样本的光照不均匀、 姿态扭曲和样本受损都表现出良好的鲁棒性,正确识别率与训练集样本各个种数量在一定条件下明显正相关,与训练集样本物种总量负相关。该项研究为实蝇科有害昆虫自动识别系统的构建及实际应用提供了理论、 方法及基础数据的支撑, 亦可为其他昆虫自动识别系统的研究和构建提供有益借鉴。 关键词:  相似文献   

2.
蛾翅数学形态特征用于夜蛾分类和鉴定的可行性研究   总被引:4,自引:0,他引:4  
摘要: 为探讨蛾翅数学形态特征(MMC)在夜蛾科分类鉴定中的可行性, 本文利用数字化技术获得和处理昆虫图像, 对鳞翅目夜蛾科6种夜蛾的右前翅提取矩形度、 延长度、 叶状性、 偏心率、 球状性、 似圆度和不变矩Hu1、 Hu2等13项与大小尺度和方向均无关的数学形态特征, 并利用方差分析、 逐步判别分析和聚类分析等方法研究了各项数学形态特征在昆虫分类上作为分类特征的可行性、 可靠性和重要性, 并且从数学形态学角度对夜蛾科6个种的亲缘关系进行了分析。分析结果认为矩形度和延长度2个形态特征对这6种夜蛾的分类鉴定没有显著意义, 从而筛选出11个形态特征作为分类变量, 它们的作用大小依次为: (偏心率、 Hu5、 Hu7)>Hu2>似圆度>球状性>Hu3>(叶状性、 Hu1、 Hu6)>Hu4。利用蛾翅的这些特征参数成功地实现了对夜蛾科6种夜蛾的分类鉴定, 基于这些特征参数的6种夜蛾的亲缘关系远近与基于传统形态学的系统进化观点相同。研究表明蛾翅数学形态特征可应用于蛾类昆虫的快速鉴定, 为未来逐步实现蛾类昆虫的自动识别奠定了基础。  相似文献   

3.
为探讨人工神经网络(ANN)在昆虫分类上的可行性,本文提出利用主成分分析和数学建模等方法相结合改进ANN,并以鳞翅目夜蛾科6种蛾类昆虫为样本进行了验证.首先利用Bugshape1.0特征提取软件获取6种蛾180个右前翅样本的13项数学形态特征数据,再运用主成分分析对蛾翅数学形态特征变量重新组合生成新的综合变量,然后结合主成分分析建立BP神经网络分类器.主成分分析结果表明,前5个主成分的累积贡献率为85.52%,已基本包含了全部特征变量具有的信息.在主成分分析的基础上,建立具有5个输入层节点,12个隐含层节点和1个输出层节点的三层BP神经网络分类器.每种蛾20个样本共120组特征数据对分类器进行训练和仿真,其余60组特征数据对分类器进行验证,仿真输出值与目标值的相关系数R=0.997,分类正确率达到了93.33%.较之未经过主成分分析而单独使用BP神经网络建立的分类器,基于主成分分析的BP神经网络分类器具有更优的性能和更准确的分类能力.研究结果表明本文提出的方法具有很好的分类和鉴别作用,为蛾种类的鉴别提供了一种可行的方法.  相似文献   

4.
【目的】植食性金龟子是我国的重要农林害虫,探索一种快速而准确地鉴别植食性金龟子的新方法,为将此法推及至其他鞘翅目昆虫的识别来建立研究基础。【方法】利用近红外光谱法对金龟子进行鉴别,提出了用支持向量机(Support vector machine,SVM)算法对15种植食性金龟子近红外光谱图(数据)进行分析,经过噪声波段去除后,用平滑求导与标准化法对的光谱进行预处理,选取金龟子标本150个,针对不同分类阶元和分类单元将66%样本谱图作为校正集,用SVM建立鉴别模型并对模型进行自身检验,用剩余样本图谱作为预测集对这些模型进行验证。【结果】模型的自身检验显示在金龟科4个亚科的鉴别模型中,鳃金龟亚科正确识别率为86%,其他样本的识别准确率均大于95%,在亚科不同属和属下不同种的鉴别模型中,除疏纹星花金Protaetia cathaica(Bates)外,其他样本的识别准确率均为100%;模型的预测集验证结果显示,在不同分类阶元和分类单元的鉴别模型中,由于云斑鳃金龟Polyphylla laticollis Lewis样本较少未能正确识别,其他样本的识别准确率均为100%。整体试验结果较为理想,说明模型性能较好。【结论】基于已定金龟子建立的模型能够很好地鉴别大部分样本,采用近红外光谱扫描技术结合支持向量机得到的植食性金龟子鉴别模型具有很强的推广能力。  相似文献   

5.
基于MFCC和GMM的昆虫声音自动识别   总被引:1,自引:0,他引:1  
竺乐庆  张真 《昆虫学报》2012,55(4):466-471
昆虫的运动、 取食、 鸣叫都会发出声音, 这些声音存在种内相似性和种间差异性, 因此可用来识别昆虫的种类。基于昆虫声音的昆虫种类自动检测技术对协助农业和林业从业人员方便地识别昆虫种类非常有意义。本研究采用了语音识别领域里的声音参数化技术来实现昆虫的声音自动鉴别。声音样本经预处理后, 提取梅尔倒谱系数(Mel frequency cepstrum coefficient, MFCC)作为特征, 并用这些样本提取的MFCC特征集训练混合高斯模型(Gaussian mixture model, GMM)。最后用训练所得到的GMM对未知类别的昆虫声音样本进行分类。该方法在包含58种昆虫声音的样本库中进行了评估, 取得了较高的识别正确率(平均精度为98.95%)和较理想的时间性能。该测试结果证明了基于MFCC和GMM的语音参数化技术可以用来有效地识别昆虫种类。  相似文献   

6.
昆虫的运动、取食、鸣叫都会发出声音,这些声音存在种内相似性和种间差异性,因此可用来识别昆虫的种类.基于昆虫声音的昆虫种类自动检测技术对协助农业和林业从业人员方便地识别昆虫种类非常有意义.本研究采用了语音识别领域里的声音参数化技术来实现昆虫的声音自动鉴别.声音样本经预处理后,提取梅尔倒谱系数(Mel-frequency cepstrum coefficient,MFCC)作为特征,并用这些样本提取的MFCC特征集训练混合高斯模型(Gaussian mixture model,GMM).最后用训练所得到的GMM对未知类别的昆虫声音样本进行分类.该方法在包含58种昆虫声音的样本库中进行了评估,取得了较高的识别正确率(平均精度为98.95%)和较理想的时间性能.该测试结果证明了基于MFCC和GMM的语音参数化技术可以用来有效地识别昆虫种类.  相似文献   

7.
气相色谱在昆虫分类上的研究与应用   总被引:1,自引:1,他引:0  
朱鹏飞 《昆虫知识》1991,28(2):115-117
<正> 随着新技术、新方法的研究与应用,使昆虫的分类逐步向着更准确地反映昆虫种类特征和系统发生关系的高水平、高层次发展,也使一些疑难的近缘种类的鉴别成为可能。分类学家们把遗传学、生理学、生化学,以及计算机等方面的最新成果应用于昆虫分类学中,如用染色体、同功酶及昆虫体内稳定化合物等因子作为分类  相似文献   

8.
GESTs(gene expression similarity and taxonomy similarity)是结合基因表达相似性和基因功能分类体系Gene Ontology (GO)中的功能概念相似性测度进行功能预测的新方法. 将此预测算法推广应用于蛋白质互相作用数据, 并提出了几种在蛋白质互作网络中为功能待测蛋白质筛选邻居的方法. 与已有的其它蛋白质功能预测方法不同, 新方法在学习过程中自动地从功能分类体系中的各个功能类中选择最合适的尽可能具体细致的功能类, 利用注释于其相近功能类中的互作邻居蛋白质支持对此具体功能类的预测. 使用MIPS提供的酵母蛋白质互作信息与一套基因表达谱数据, 利用特别针对GO体系结构层次特点设计的3种测度, 评价对GO知识体系中的生物过程分支进行蛋白质功能预测的效果. 结果显示, 利用文中的方法, 可以大范围预测蛋白质的精细功能. 此外, 还利用此方法对2004年底Gene Ontology上未知功能的蛋白质进行预测, 其中部分预测结果在2006年4月发布的SGD注释数据中已经得到了证实.  相似文献   

9.
2002年5月建立的昆虫新目——螳虫脩(Xiu)目在科学界引起了很大的震动。对该目的发现简史、形态学特征、亲缘关系等方面作了较为详尽的介绍,报告了该目昆虫在分类和生物学领域的新进展,给出了完整的名录,并建议了该目的科、属及种的中文名称。该目现已知3科12属共计15种,其中有2个为化石种。  相似文献   

10.
几何形态计量学着重研究生物形态的拓扑结构信息,不受昆虫标本大小和形状等因素的影响.本文提出利用几何形态计量学中的相对扭曲分析来实现昆虫分类鉴定的研究,做为方法论的探索,本文以鳞翅目夜蛾科6种蛾类昆虫的翅脉图像样本为试验材料.首先利用软件TpsDig2获取6种蛾180个右前翅翅脉样本的标记点,再运用软件TpsSuper对其进行普氏叠加分析,消除非形状因素等多余的信息,最后利用软件TpsRelw进行相对扭曲分析,通过分析得到的相对扭曲图像可以使昆虫的分类实现二维可视化,因此可以更直观地做出其种类的鉴定.研究结果表明本文为蛾类昆虫的可视化鉴定提供了一种可行的方法,对于昆虫分类鉴定的形态学测量数据可视化具有重要意义.  相似文献   

11.
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.  相似文献   

12.
13.
《Genomics》2022,114(2):110264
Cancer is one of the major causes of human death per year. In recent years, cancer identification and classification using machine learning have gained momentum due to the availability of high throughput sequencing data. Using RNA-seq, cancer research is blooming day by day and new insights of cancer and related treatments are coming into light. In this paper, we propose PanClassif, a method that requires a very few and effective genes to detect cancer from RNA-seq data and is able to provide performance gain in several wide range machine learning classifiers. We have taken 22 types of cancer samples from The Cancer Genome Atlas (TCGA) having 8287 cancer samples and 680 normal samples. Firstly, PanClassif uses k-Nearest Neighbour (k-NN) smoothing to smooth the samples to handle noise in the data. Then effective genes are selected by Anova based test. For balancing the train data, PanClassif applies an oversampling method, SMOTE. We have performed comprehensive experiments on the datasets using several classification algorithms. Experimental results shows that PanClassif outperform existing state-of-the-art methods available and shows consistent performance for two single cell RNA-seq datasets taken from Gene Expression Omnibus (GEO). PanClassif improves performances of a wide variety of classifiers for both binary cancer prediction and multi-class cancer classification. PanClassif is available as a python package (https://pypi.org/project/panclassif/). All the source code and materials of PanClassif are available at https://github.com/Zwei-inc/panclassif.  相似文献   

14.
Many field studies of insects have focused on the adult stage alone, likely because immature stages are unknown in most insect species. Molecular species identification (e.g., DNA barcoding) has helped ascertain the immature stages of many insects, but larval developmental stages (instars) cannot be identified. The identification of the growth stages of collected individuals is indispensable from both ecological and taxonomic perspectives. Using a larval–adult body size relationship across species, I present a novel technique for identifying the instar of field-collected insect larvae that are identified by molecular species identification technique. This method is based on the assumption that classification functions derived from discriminant analyses, performed with larval instar as a response variable and adult and larval body sizes as explanatory variables, can be used to determine the instar of a given larval specimen that was not included in the original data set, even at the species level. This size relationship has been demonstrated in larval instars for many insects (Dyar’s rule), but no attempt has been made to include the adult stage. Analysis of a test data set derived from the beetle family Carabidae (Coleoptera) showed that classification functions obtained from data sets derived from related species had a correct classification rate of 81–100%. Given that no reliable method has been established to identify the instar of field-collected insect larvae, these values may have sufficient accuracy as an analytical method for field-collected samples. The chief advantage of this technique is that the instar can be identified even when only one specimen is available per species if classification functions are determined for groups to which the focal species belongs. Similar classification functions should be created for other insect groups. By using those functions together with molecular species identification, future studies could include larval stages as well as adults.  相似文献   

15.
For a protein, an important characteristic is its location or compartment in a cell. This is because a protein has to be located in its proper position in a cell to perform its biological functions. Therefore, predicting protein subcellular location is an important and challenging task in current molecular and cellular biology. In this paper, based on AdaBoost.ME algorithm and Chou's PseAAC (pseudo amino acid composition), a new computational method was developed to identify protein subcellular location. AdaBoost.ME is an improved version of AdaBoost algorithm that can directly extend the original AdaBoost algorithm to deal with multi-class cases without the need to reduce it to multiple two-class problems. In some previous studies the conventional amino acid composition was applied to represent protein samples. In order to take into account the sequence order effects, in this study we use Chou's PseAAC to represent protein samples. To demonstrate that AdaBoost.ME is a robust and efficient model in predicting protein subcellular locations, the same protein dataset used by Cedano et al. (Journal of Molecular Biology, 1997, 266: 594-600) is adopted in this paper. It can be seen from the computed results that the accuracy achieved by our method is better than those by the methods developed by the previous investigators.  相似文献   

16.
Conotoxins are disulfide rich small peptides that target a broad spectrum of ion-channels and neuronal receptors. They offer promising avenues in the treatment of chronic pain, epilepsy and cardiovascular diseases. Assignment of newly sequenced mature conotoxins into appropriate superfamilies using a computational approach could provide valuable preliminary information on the biological and pharmacological functions of the toxins. However, creation of protein sequence patterns for the reliable identification and classification of new conotoxin sequences may not be effective due to the hypervariability of mature toxins. With the aim of formulating an in silico approach for the classification of conotoxins into superfamilies, we have incorporated the concept of pseudo-amino acid composition to represent a peptide in a mathematical framework that includes the sequence-order effect along with conventional amino acid composition. The polarity index attribute, which encodes information such as residue surface buriability, polarity, and hydropathy, was used to store the sequence-order effect. Several methods like BLAST, ISort (Intimate Sorting) predictor, least Hamming distance algorithm, least Euclidean distance algorithm and multi-class support vector machines (SVMs), were explored for superfamily identification. The SVMs outperform other methods providing an overall accuracy of 88.1% for all correct predictions with generalized squared correlation of 0.75 using jackknife cross-validation test for A, M, O and T superfamilies and a negative set consisting of short cysteine rich sequences from different eukaryotes having diverse functions. The computed sensitivity and specificity for the superfamilies were found to be in the range of 84.0-94.1% and 80.0-95.5%, respectively, attesting to the efficacy of multi-class SVMs for the successful in silico classification of the conotoxins into their superfamilies.  相似文献   

17.
To classify proteins into functional families based on their primary sequences, popular algorithms such as the k-NN-, HMM-, and SVM-based algorithms are often used. For many of these algorithms to perform their tasks, protein sequences need to be properly aligned first. Since the alignment process can be error-prone, protein classification may not be performed very accurately. To improve classification accuracy, we propose an algorithm, called the Unaligned Protein SEquence Classifier (UPSEC), which can perform its tasks without sequence alignment. UPSEC makes use of a probabilistic measure to identify residues that are useful for classification in both positive and negative training samples, and can handle multi-class classification with a single classifier and a single pass through the training data. UPSEC has been tested with real protein data sets. Experimental results show that UPSEC can effectively classify unaligned protein sequences into their corresponding functional families, and the patterns it discovers during the training process can be biologically meaningful.  相似文献   

18.
Xu P  Yang P  Lei X  Yao D 《PloS one》2011,6(1):e14634

Background

There is a growing interest in the study of signal processing and machine learning methods, which may make the brain computer interface (BCI) a new communication channel. A variety of classification methods have been utilized to convert the brain information into control commands. However, most of the methods only produce uncalibrated values and uncertain results.

Methodology/Principal Findings

In this study, we presented a probabilistic method “enhanced BLDA” (EBLDA) for multi-class motor imagery BCI, which utilized Bayesian linear discriminant analysis (BLDA) with probabilistic output to improve the classification performance. EBLDA builds a new classifier that enlarges training dataset by adding test samples with high probability. EBLDA is based on the hypothesis that unlabeled samples with high probability provide valuable information to enhance learning process and generate a classifier with refined decision boundaries. To investigate the performance of EBLDA, we first used carefully designed simulated datasets to study how EBLDA works. Then, we adopted a real BCI dataset for further evaluation. The current study shows that: 1) Probabilistic information can improve the performance of BCI for subjects with high kappa coefficient; 2) With supplementary training samples from the test samples of high probability, EBLDA is significantly better than BLDA in classification, especially for small training datasets, in which EBLDA can obtain a refined decision boundary by a shift of BLDA decision boundary with the support of the information from test samples.

Conclusions/Significance

The proposed EBLDA could potentially reduce training effort. Therefore, it is valuable for us to realize an effective online BCI system, especially for multi-class BCI systems.  相似文献   

19.

Background  

Gene expression microarray is a powerful technology for genetic profiling diseases and their associated treatments. Such a process involves a key step of biomarker identification, which are expected to be closely related to the disease. A most important task of these identified genes is that they can be used to construct a classifier which can effectively diagnose disease and even recognize the disease subtypes. Binary classification, for example, diseased or healthy, in microarray data analysis has been successful, while multi-class classification, such as cancer subtyping, remains challenging.  相似文献   

20.
ABSTRACT

We report on our research efforts towards developing efficient equipment for the automatic recognition of insects using only the acoustic modality. Specifically, we deal with three groups of insects, namely the crickets, cicadas and katydids. Inspired by well-documented tactics of speech processing, the signal processing employed in the present work is elaborated further with respect to the sound production mechanisms of insects. In order to improve the practical efficacy of our equipment, we adopt a score-level fusion of classifiers with non-parametric (probabilistic neural network) and parametric (Gaussian mixture models) estimation of the probability density function. An efficient hierarchic classification scheme is introduced, where the identification of unlabelled input takes place at various levels of hierarchy, such as suborder, family, subfamily, genus and species. We evaluate the practical significance of our approach on a large and well-documented catalogue of recordings of crickets, cicadas and katydids. For the hierarchic classification scheme, we report identification accuracy that exceeds 99% at suborder and family levels. In the straight classification scheme, we report accuracy of 90% for 307 species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号