首页 | 本学科首页   官方微博 | 高级检索  
   检索      

合并与不合并:两个相似性聚类分析方法比较
引用本文:刘新涛,刘晓光,申琪,张书杰,杨党伟,任应党.合并与不合并:两个相似性聚类分析方法比较[J].生态学报,2013,33(11):3480-3487.
作者姓名:刘新涛  刘晓光  申琪  张书杰  杨党伟  任应党
作者单位:1. 河南省农业科学院植物保护研究所,河南省农作物病虫害防治重点实验室,农业部华北南部作物有害生物综合治理重点实验室,郑州450002
2. 河南中医学院,郑州,450008
3. 郑州大学生物工程系,郑州,450001
基金项目:河南省基础与前沿技术研究计划项目(河南省昆虫区系研究(082300430370));河南省重点实验室建设专项(河南昆虫地理分布及区划研究(112300413221))
摘    要:以山西省4638种昆虫在7个地理小区的分布、内蒙古7766种昆虫在14个地理小区的分布和中国16804属昆虫在67个生态区域的分布3组数据为样本,用传统的层层合并的相似性聚类分析法(SCA)和新的不需合并的多元相似性聚类分析法(MSCA)进行运算分析,对比结果表明,不合并法都能得到既符合统计学逻辑,又符合地理学、生物学逻辑的结果;合并法在参与小区较少时,还能够得到与不合并法类似的结果,随着参与小区的增多,聚类结构发生变化,以致聚类功能彻底丧失.无论两种聚类结果差异大小,其性质都迥然不同:不合并法的相似性系数是固有的、互相独立的、同时存在的,聚类结果是所有小区之间关系亲疏、距离远近的状态;合并法的每个相似性系数都是合并的依据或结果,前一个系数是后一个系数产生的条件,后一个系数是前一个系数消亡的结果,严格按照顺序,当最后一个系数产生时,前面所有系数和所有小区都已不复存在,聚类结果只是记录不断合并、不断消亡的过程.因此在肯定合并法历史价值的同时,认为申效诚等创建的多元相似性系数公式及多元相似性聚类分析法摈弃合并降阶这一产生偏差和错误的根源,能够得出相对客观的聚类结果,是生物地理学研究领域有效的聚类分析工具,必将推动生物地理学定量研究迈入一个新阶段.

关 键 词:多元相似性聚类分析  多元相似性系数  生物地理学
收稿时间:3/9/2012 12:00:00 AM
修稿时间:2012/12/14 0:00:00

Comparison of merged and non-merged similarity clustering analysis methods
LIU Xintao,LIU Xiaoguang,SHEN Qi,ZHANG Shujie,YANG Dangwei and REN Yingdang.Comparison of merged and non-merged similarity clustering analysis methods[J].Acta Ecologica Sinica,2013,33(11):3480-3487.
Authors:LIU Xintao  LIU Xiaoguang  SHEN Qi  ZHANG Shujie  YANG Dangwei and REN Yingdang
Institution:Institute of Plant Protection, Henan Academy of Agricultural Sciences; The Key Laboratory of Crops Pests and Diseases Control of Henan Province; The Key Laboratory of Integrated Pest Management on Crops in the Southern Region of North China; the Agriculture Ministry of China, Zhengzhou 450002, China;Institute of Plant Protection, Henan Academy of Agricultural Sciences; The Key Laboratory of Crops Pests and Diseases Control of Henan Province; The Key Laboratory of Integrated Pest Management on Crops in the Southern Region of North China; the Agriculture Ministry of China, Zhengzhou 450002, China;Henan University of Traditional Chinese Medicine, Zhengzhou, 450008, China;Bioengineering Department of Zhengzhou University, Zhengzhou 450001, China;Institute of Plant Protection, Henan Academy of Agricultural Sciences; The Key Laboratory of Crops Pests and Diseases Control of Henan Province; The Key Laboratory of Integrated Pest Management on Crops in the Southern Region of North China; the Agriculture Ministry of China, Zhengzhou 450002, China;Institute of Plant Protection, Henan Academy of Agricultural Sciences; The Key Laboratory of Crops Pests and Diseases Control of Henan Province; The Key Laboratory of Integrated Pest Management on Crops in the Southern Region of North China; the Agriculture Ministry of China, Zhengzhou 450002, China
Abstract:Distribution data of 4638 species in seven geographic regions of Shanxi Province were examined as a small sample, of 7766 species in 14 geographic regions of Inner Mongolia as a medium sample, and of 16804 genera in 67 ecological regions of China as a large sample. Statistical analyses of the three data groups were conducted separately, using a traditional merged method (similarity clustering analysis, SCA) and a new non-merged method (multivariate similarity clustering analysis (MSCA)). A critical comparison of the two methods demonstrates that the non-merged method can attain a result suitable for both logistics of biological statistics and geography, regardless of the scale of the data. The merged method (SCA) may achieve a result closely resembling that of the non-merged method when dealing with a fewer number of geographic regions. However, with an increased number of geographic regions, the clustering structure with the merged method may create a change at a different level-so much as to cause a complete loss of functionality. Regardless of the magnitude of difference between results of the two kinds of clustering, their nature will be totally different. The non-merged method similarity coefficients are inherent, independent of each other and exist simultaneously, the clustering result reflects the relationship and distance of all involved geographic regions, and all the coefficients are easily calculated with no strict orders. In the merged method, however, every coefficient was considered to be founded upon or be the result of clustering. The non-merged coefficient is the basis for the merged coefficient's emergence, which is a result of the non-merged coefficient's disappearance after merging. All of the calculations depend on input data and the deduced result is strictly in alphabetical order. It should be noted that the newest or final coefficients were worked out or generated, whereas the non-merged coefficients as well as the involved geographic regions had to be eliminated or discarded. The newest clustering coefficients were constantly generated, subsequently disappearing with the circulation. MSCA, in agreement with the value and huge contribution by SCA methods, can correct errors or inaccuracy that caused by merging or descending order during clustering by SCA method. It especially avoids some lost branches in the clustering result that are very important to the relationship, and cannot find any similarity level that requires indication in some detail. In summary, the MSCA method can solve many of the problems of the SCA method. The clustering achieves greater accuracy, which makes the results fit ecological reality. Also, our modified MSCA method can easily perform macroscopic clustering analysis of ecosystem data, which has never been completely accomplished before.
Keywords:multivariate similarity clustering analysis  multivariate similarity coefficients  biogeography
本文献已被 万方数据 等数据库收录!
点击此处可从《生态学报》浏览原始摘要信息
点击此处可从《生态学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号