基于改进EMD距离的信息特征单元的聚类方法  被引量:1

The Method Clustering the Information Feature Units Based on Improved EMD Distance

在线阅读下载全文

作  者:兰慧红[1] 黄紧德[1] LAN Hui-hong;HUANG Jin-de(Department of Mathematics and Information Science/Guangxi College of Education, Nanning 530023, China)

机构地区:[1]广西教育学院数学与信息科学学院

出  处:《山东农业大学学报(自然科学版)》2019年第5期885-888,920,共5页Journal of Shandong Agricultural University:Natural Science Edition

基  金:广西教育厅科研项目:基于文本聚类的东盟跨语言查询扩展模型及算法研究(2019KY1678)

摘  要:为研究基于改进 EMD 距离的信息特征单元聚类方法,本文利用向量空间方法提取信息特征单元,设置EMD 地面距离作为不同信息特征单元间的距离,将信息特征单元比作供货商与消费商。为避免利用 EMD 距离聚类引起的信息特征单元过分割、正例现象增多以及供货商无法供货问题,设置符合特征相似条件的供货商增大权值的相似阈值,利用阈值令运输以低成本的供货商为主,改进 EMD 距离;利用改进 EMD 距离算法实现信息特征单元的有效聚类。经仿真平台验证,该方法对文本、股票等不同类型信息特征单元聚类精度达到 99%以上,并且聚类过程迭代次数少,聚类性能优。To study on the method clustering information feature units based on EMD distance, this paper extracted information feature units by the vector space method to set EMD ground distances as the distances between different information feature units and information feature units were compared to suppliers and consumers. In order to avoid the over-segmentation for information feature units caused by EMD distance clustering, the increase of positive phenomena and the inability of suppliers to set a similar threshold for suppliers with similar characteristics to increase their weight and the use of thresholds made transportation mainly for low-cost suppliers improve EMD distance;An improved EMD distance algorithm was used to achieve effective clustering of information feature units. The method could effectively cluster different types of information feature units, such as text and stock, with an accuracy of more than 99%, and the clustering process had fewer iterations and excellent clustering performance.

关 键 词:EMD 距离 信息特征单元 聚类方法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象