检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]神华集团,北京100011
出 处:《情报学报》2012年第7期709-714,共6页Journal of the China Society for Scientific and Technical Information
摘 要:物资分类是企业物资管理的一项基础工作,在大型企业中,物资数量巨大且类别繁多,所以需要借助计算机自动分类技术提高物资分类的效率。在自动分类的过程中,物资名称相似度是影响分类效果的关键因素之一。在分析了物资名称字符串特点和Jaro—Winkle算法的基础上,提出了一种基于动态权重的中文字符串相似度计算方法。通过在真实物资分类数据集上的实验,验证了这种相似度的计算方法可以有效提高物资分类的准确度。Material classification plays a fundamental role in enterprise material management, while the huge amount of materials and categories make it impossible to accomplish the task by manual editing. Therefore it is important to integrate automatic classification methodologies into enterprise material classification. In the process of automatic material classification, the material name similarity metric is essential; however traditional string similarity metrics are not suitable for Chinese material names. In this paper, after evaluating the Jaro-Winkle algorithm, a novel material classification- oriented Chinese string similarity metric is proposed by estimating the weights of the suffixes in Chinese material names dynamically. Finally, the experiment resuhs on a real dataset of Chinese Materials are reported, which shows that the dynamic-weighting based string similarity metric outperforms the traditional metrics.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.136.20.207