检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郑静[1] 冯道鹏 ZHENG Jing;FENG Daopeng(Hangzhou Dianzi University,Hangzhou 310018,China)
出 处:《现代信息科技》2023年第19期83-88,共6页Modern Information Technology
基 金:国家社会科学项目(21BTJ071)。
摘 要:传统主题模型LDA使用词袋建模文档,无法建模词语之间的语义关系。虽然随后提出的ETM利用词嵌入的方法来建模词语之间的相似度,但是它们都无法处理一词多义现象。针对以上问题提出一种消歧主题模型。采用基于BERT的消歧方法并结合ETM对大型词表的鲁棒性,使得主题模型建模一词多义成为可能。通过在通用数据集上进行实验,验证了所提出模型在精确主题含义,增强主题可理解性上的优越性能,该模型能够挖掘出含义精确的主题,提高了主题建模的应用范围。The traditional theme model LDA uses word bags to model documents,which cannot model the semantic relationships among words.Although the ETM proposed later uses word embedding method to model the similarity among words,they are unable to handle the phenomenon of polysemy.Propose a disambiguation theme model to address the above issues.The use of BERT-based disambiguation method and combined with ETM's robustness to large word lists makes it possible to model polysemy in theme models.By conducting experiments on a universal dataset,the superior performance of the proposed model in precise theme meanings and enhancing theme comprehensibility are verified.The model can mine theme with precise meanings and improve the application range of theme modeling.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7