基于知识图谱扩展的短文本分类方法被引量：5

Short Text Classification Based on Knowledge Graph Extension

作　　者：丁连红[1] 孙斌张宏伟 DING Lianhong;SUN Bin;ZHANG Hongwei(School of Information,Beijing Wuzi University,Beijing 101149,China)

机构地区：[1]北京物资学院信息学院,北京101149

出　　处：《情报工程》2018年第5期38-46,共9页Technology Intelligence Engineering

基　　金：北京市社会科学基金项目青年项目"社交电商中消费行为演化机制及引导措施研究"(17GLC066);北京物资学院高级别培养项目(GJB20162002)

摘　　要：概念图谱是微软根据对用户搜索日志的统计分析构建的一个大型知识图谱。为了解决文本分类中短文本的数据稀疏、易受噪声影响和主题不明确等问题,本文提出了一种基于概念图谱的短文本语义扩展表示方法。首先,计算文本特征词与概念图谱中各概念的关联度,选取关联度高的概念构成当前文本的概念词典。然后,将概念词典加入特征词集合得到短文本的语义扩展表示。对来自Twitter的短文本进行了扩展前与扩展后的分类实验,实验涉及5种分类算法和6种关联度计算方法。结果显示,概念化语义扩展表示可以提高短文本的分类效果,且包含可以扩展的特征越多的文本,分类结果提升越显著。The Concept Graph is a large-scale knowledge graph constructed by Microsoft based on statistical analysis of user search logs. In order to solve the problem of sparse data, vulnerability to noise, and unclear topic in short text classification, this paper proposes a short text semantic extension representation method based on the Concept Graph. Firstly, the relevance degree between the feature words and the concepts in the Concept Graph is calculated. Top k concepts with the highest relevance are selected as the concept dictionary of the current text. Then, the concept dictionary is combined with the feature words to obtain the semantic representation of the short text. Dataset from Twitter is adopted to evaluate our method. 5 classification algorithms and 6 correlation calculation methods are involved in the experiments. The experiment results show that the semantic representation through conceptualized extension can enhance the classification of short text. We also find the more the feature words that can be expanded in the short text, the better the classification result is.

关键词：短文本分类语义扩展知识图谱知识推理

分类号：TP391[自动化与计算机技术—计算机应用技术] G35[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于知识图谱扩展的短文本分类方法被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于知识图谱扩展的短文本分类方法 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于知识图谱扩展的短文本分类方法被引量：5