检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:兰慧红[1] 黄紧德[1] LAN Hui-hong;HUANG Jin-de(Department of Mathematics and Information Science/Guangxi College of Education, Nanning 530023, China)
机构地区:[1]广西教育学院数学与信息科学学院
出 处:《山东农业大学学报(自然科学版)》2019年第5期885-888,920,共5页Journal of Shandong Agricultural University:Natural Science Edition
基 金:广西教育厅科研项目:基于文本聚类的东盟跨语言查询扩展模型及算法研究(2019KY1678)
摘 要:为研究基于改进 EMD 距离的信息特征单元聚类方法,本文利用向量空间方法提取信息特征单元,设置EMD 地面距离作为不同信息特征单元间的距离,将信息特征单元比作供货商与消费商。为避免利用 EMD 距离聚类引起的信息特征单元过分割、正例现象增多以及供货商无法供货问题,设置符合特征相似条件的供货商增大权值的相似阈值,利用阈值令运输以低成本的供货商为主,改进 EMD 距离;利用改进 EMD 距离算法实现信息特征单元的有效聚类。经仿真平台验证,该方法对文本、股票等不同类型信息特征单元聚类精度达到 99%以上,并且聚类过程迭代次数少,聚类性能优。To study on the method clustering information feature units based on EMD distance, this paper extracted information feature units by the vector space method to set EMD ground distances as the distances between different information feature units and information feature units were compared to suppliers and consumers. In order to avoid the over-segmentation for information feature units caused by EMD distance clustering, the increase of positive phenomena and the inability of suppliers to set a similar threshold for suppliers with similar characteristics to increase their weight and the use of thresholds made transportation mainly for low-cost suppliers improve EMD distance;An improved EMD distance algorithm was used to achieve effective clustering of information feature units. The method could effectively cluster different types of information feature units, such as text and stock, with an accuracy of more than 99%, and the clustering process had fewer iterations and excellent clustering performance.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38