检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李湘东[1,2] 孙倩茹 石健 Li Xiangdong;Sun Qianru;Shi Jian(School of Information Management,Wuhan University;Center for Electronic Commerce Research and Development,Wuhan University,Wuhan,430072)
机构地区:[1]武汉大学信息管理学院 [2]武汉大学电子商务研究与发展中心,武汉430072
出 处:《信息资源管理学报》2023年第1期129-139,共11页Journal of Information Resources Management
摘 要:针对商品评论文本具有短文本及表述用词不规范的特点,探讨如何实现商品评论文本按照商品种类进行自动归类并提高其分类效果。通过TF-IDF和LDA构建训练集的核心词集,利用Word2Vec相似度计算方式对短文本进行特征扩展获得的商品评论文本作为分类对象,基于BERT模型实现分类,并设计相应的对比实验证明本方法的有效性。对商品评论文本扩展后使用BERT分类时,本文方法比未扩展时的F1值提升2.1%,比使用Hownet相似度计算方式扩展时的F1值提升0.9%。从基本原理、不同相似度计算方法以及用词方式等方面分析本方法有效性的原因。本文提出的方法能有效提升商品评论文本按照商品进行信息组织时的分类效果,可以应用于电子商务信息的信息组织及其相关理论方法研究等领域。In view of the fact that texts of product reviews are short and words are informal,this research aims to explore how to automatically classify product review texts by product categories and improve the classification performance.The core words set of the training set is constructed through the TF-IDF and LDA model,and short texts are extended by Word2Vec similarity calculation method.After extension,the product reviews are categorized by the product categories based on the Bidirectional Encoder Representation of Transformer(BERT)model.And then we design corresponding comparative experiments to prove the effectiveness of this method.When using BERT classification for the product reviews after extension,the F1 value obtained by the method proposed in this paper is 2.1 percent higher than are not extended,and it is 0.9 percent higher than that when using HowNet similarity calculation method.The reasons for the effectiveness of the method proposed in this paper are analyzed from the aspects of basic principles,different word similarity calculation methods,and words used methods.The method proposed in this paper can effectively improve the classification performance of the product reviews when organizing information by product categories,and be applied to the field of information organization of e-commerce information and research on related theories and methods.
关 键 词:商品评论文本 短文本 特征扩展 Word2Vec BERT
分 类 号:TP391[自动化与计算机技术—计算机应用技术] G35[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171