检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张洋 江铭虎[1] ZHANG Yang;JIANG Minghu(Computational Linguistics Laboratory,Department of Chinese,School of Humanities,Tsinghua University,Beijing 100084,China)
机构地区:[1]清华大学人文学院中文系,计算语言学实验室,北京100084
出 处:《清华大学学报(自然科学版)》2023年第9期1390-1398,共9页Journal of Tsinghua University(Science and Technology)
基 金:国家自然科学基金重点项目(62036001)。
摘 要:作者识别是通过分析未知文本的写作风格推断作者归属的交叉学科。现有的研究多基于字符和词汇特征,而句法关联信息在研究中鲜有涉及。该文提出了基于句法树节点嵌入的作者识别方法,将句法树的节点表示成其所有依存弧对应的嵌入之和,把依存关系信息引入深度学习模型中。然后构建句法注意力网络,并通过该网络得到句法感知向量。该向量同时融合了依存关系、词性以及单词等信息。接着通过句子注意力网络得到句子的表示,最后通过分类器进行分类。在3个英文数据集的实验中,该文方法的性能位列第2或3位。更重要的是,依存句法组合的引入为模型的解释提供了更多的方向。[Objective]Authorship identification is a study for inferring authorship of an unknown text by analyzing its stylometry or writing style.The traditional research on authorship identification is generally based on the empirical knowledge of literature or linguistics,whereas modern research mostly relies on mathematical methods to quantify the author’s writing style.Currently,researchers have proposed various feature combinations and neural network models.Some feature combinations can achieve better results with traditional machine learning classifiers,while some neural network models can autonomously learn the relationship between the input text and corresponding author to extract text features implicitly.However,the current research mostly focuses on character and lexicon features.Furthermore,the exploration of syntactic features is limited.How to use the dependency relationship between different words in a sentence and combine syntactic features with neural networks still remains unclear.This paper proposes an authorship identification method based on the syntax tree node embedding,which introduces syntactic features into a deep learning model.[Methods]We believe that an author’s writing style is mainly reflected in the way he chooses words and constructs sentences.Therefore,this paper mainly develops the authorship identification model from the perspectives of words and sentences.The attention mechanism is used to construct sentence-level features.First,an embedding representation of the syntax tree node is proposed,and the syntax tree node is expressed as a sum of embeddings corresponding to all its dependency arcs.Thus,the information on sentence structure and the association between words are introduced into the neural network model.Then,a syntactic attention network using different embedding methods to vectorize text features,such as dependencies,part-of-speech tags,and words,is constructed,and a syntax-aware vector is obtained through this network.Furthermore,the sentence attention network is used to e
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.192.101