检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:丁连红[1] 孙斌 张宏伟 DING Lianhong;SUN Bin;ZHANG Hongwei(School of Information,Beijing Wuzi University,Beijing 101149,China)
出 处:《情报工程》2018年第5期38-46,共9页Technology Intelligence Engineering
基 金:北京市社会科学基金项目青年项目"社交电商中消费行为演化机制及引导措施研究"(17GLC066);北京物资学院高级别培养项目(GJB20162002)
摘 要:概念图谱是微软根据对用户搜索日志的统计分析构建的一个大型知识图谱。为了解决文本分类中短文本的数据稀疏、易受噪声影响和主题不明确等问题,本文提出了一种基于概念图谱的短文本语义扩展表示方法。首先,计算文本特征词与概念图谱中各概念的关联度,选取关联度高的概念构成当前文本的概念词典。然后,将概念词典加入特征词集合得到短文本的语义扩展表示。对来自Twitter的短文本进行了扩展前与扩展后的分类实验,实验涉及5种分类算法和6种关联度计算方法。结果显示,概念化语义扩展表示可以提高短文本的分类效果,且包含可以扩展的特征越多的文本,分类结果提升越显著。The Concept Graph is a large-scale knowledge graph constructed by Microsoft based on statistical analysis of user search logs. In order to solve the problem of sparse data, vulnerability to noise, and unclear topic in short text classification, this paper proposes a short text semantic extension representation method based on the Concept Graph. Firstly, the relevance degree between the feature words and the concepts in the Concept Graph is calculated. Top k concepts with the highest relevance are selected as the concept dictionary of the current text. Then, the concept dictionary is combined with the feature words to obtain the semantic representation of the short text. Dataset from Twitter is adopted to evaluate our method. 5 classification algorithms and 6 correlation calculation methods are involved in the experiments. The experiment results show that the semantic representation through conceptualized extension can enhance the classification of short text. We also find the more the feature words that can be expanded in the short text, the better the classification result is.
分 类 号:TP391[自动化与计算机技术—计算机应用技术] G35[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249