检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李书羽 朱广丽[1,3] 李嘉伟 段文杰 周若彤 张顺香 Li Shuyu;Zhu Guangli;Li Jiawei;Duan Wenjie;Zhou Ruotong;Zhang Shunxiang(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan 232001,China;School of Computer,Huainan Normal University,Huainan 232038,China;Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230088,China)
机构地区:[1]安徽理工大学计算机科学与工程学院,淮南232001 [2]淮南师范学院计算机学院,淮南232038 [3]合肥综合性国家科学中心人工智能研究院,合肥230088
出 处:《数据分析与知识发现》2025年第2期1-11,共11页Data Analysis and Knowledge Discovery
基 金:国家自然科学基金面上项目(项目编号:62076006);认知智能全国重点实验室开放课题(项目编号:COGOS-2023HE02);安徽高校协同创新项目(项目编号:GXXT-2021-008)的研究成果之一。
摘 要:【目的】为解决中文反讽短文本中存在的特征稀疏问题,提出一种融入夸张表征的中文反讽识别方法,挖掘短文本中的夸张表征以提升中文反讽识别准确率。【方法】通过点互信息和语义相似度计算分别获取与反讽领域相关的共现词对集、感叹词集与程度副词集,合并上述词集构建夸张表征词典;然后,通过正则表达式匹配反讽文本得到特殊标点符号序列并经独热编码获得特殊标点符号特征,采用RoBERTa-wwm-ext模型获取文本语义特征,利用WoBERT模型将夸张表征词典内的词及词对转化为动态词向量,获取夸张表征;最后,改进多头注意力机制,同时关注文本语义特征、夸张表征以及特殊标点符号特征,经Softmax函数得到识别结果。【结果】将公开的Ciron和ChineseSarcasm-Corpus数据集合并后进行实验,本文方法准确率达81.49%,F1值达81.24%。【局限】构建的夸张表征词典依赖语料质量,泛化能力有限。【结论】本文方法通过挖掘中文反讽短文本中存在的夸张表征,结合文本语义信息,能有效丰富文本语义表示,提升中文反讽识别的准确率。[Objective]To address the issue of feature sparsity in Chinese ironic short texts,this paper proposes a sarcasm detection method integrating hyperbolic representations.It aims to enhance the accuracy of Chinese sarcasm recognition by extracting hyperbolic representations from short texts.[Methods]Firstly,we used pointwise mutual information and semantic similarity computation to obtain co-occurring word pairs,interjections,and degree adverbs related to sarcasm.We also merged these word sets to construct a hyperbolic representation lexicon.Then,we used the regular expression to match sarcastic texts and obtained a sequence of special punctuations.We extracted these punctuations'special features with one-hot encoding.The RoBERTawwm-ext model is employed to extract semantic features from the text.The WoBERT method transformed the words and word pairs within the hyperbolic representation lexicon into dynamic word vectors,obtaining the hyperbolic representation.Finally,we introduced an improved multi-attention mechanism to focus on text semantics,hyperbolic representations,and special punctuation features and obtained the recognition results through the Softmax function.[Results]We examined the proposed method with merged publicly available Ciron and ChineseSarcasm-Corpus datasets,achieving an accuracy of 81.49%and an F,value of 81.24%.[Limitations]The constructed hyperbolic representation lexicon relies on corpus quality and has limited generalization ability.[Conclusions]The proposed method can effectively enrich semantic representation and improve the accuracy of Chinese sarcasmdetection.
关 键 词:中文反讽领域词典 夸张表征 RoBERTa-wwm-ext 多头注意力机制
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49