检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴海燕 刘颖[1] WU Haiyan;LIU Ying(School of Humanities,Tsinghua University,Beijing 100084,China)
机构地区:[1]清华大学人文学院,北京100084
出 处:《计算机应用》2020年第8期2171-2181,共11页journal of Computer Applications
基 金:国家社会科学基金资助项目(18ZDA238);教育部人文社科一般项目(17YJAZH056);北京社会科学基金资助项目(16YYB021)。
摘 要:针对大规模语料中不同语体的特征难以挖掘、需要大量专业知识和人力的问题,提出了一种自动挖掘能区分不同语体的特征的方法。首先,将语体表示成词、词类、标点符号、它们的2元、句法结构及多种组合特征;然后,使用注意力机制和多层感知机(MLP)的组合模型(如注意力网络)把语体分类成小说、新闻和课本,并在过程中自动地提取出能够帮助区分语体的重要特征;最后,通过对这些特征的进一步分析,可以得到不同语体的特点及一些语言学结论。实验结果显示,小说、新闻和课本在词、主题词、词的依存关系、词类、标点符号和句法结构都有显著的差异,进一步表明了人们在使用语言时因交际对象、目的、内容和环境的不同,对词汇、词类、标点和句法的运用上会自然地呈现出某种不同。To solve the problem that it is difficult to mine the features of different registers in large-scale corpus and it needs a lot of professional knowledge and manpower,a method to mine the features of distinguishing different registers automatically was proposed.First,the register was expressed as words,parts-of-speech,punctuations,and their bigrams,syntactic structure as well as multiple combined features.Then,the combination model of attention mechanism and Multi-Layer Perceptron(MLP)(i.e.attention network)was used to classify the registers into novel,news and textbook.And,the important features that were able to help to distinguish the registers were automatically extracted in this process.Finally,through the further analysis of these features,the characteristics of different registers and some linguistic conclusions were obtained.Experimental results show that novel,news,and textbook have significant differences in words,topic words,word dependencies,parts-of-speech,punctuations and syntactic structures,which implies that there will naturally present some diversity in the use of words,parts-of-speech,punctuations,and syntactic structures due to the different communication objects,purposes,contents,and environments when people utilize language.
关 键 词:语体特征挖掘 语体特征区分度 注意力机制 多层感知机
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.149.246.99