小句识别所依赖的语段全局范围探究——基于预训练语言模型Bert的汉语小句识别  被引量:2

Detecting the Global Range of Segments for Clause Recognition with Bert

在线阅读下载全文

作  者:冯文贺 高子雄 张文娟 FENG Wenhe;GAO Zixiong;ZHANG Wenjuan

机构地区:[1]广东外语外贸大学外国语言学及应用语言学研究中心、语言工程与计算实验室,广州510420 [2]广东外语外贸大学,广州510420

出  处:《语言文字应用》2022年第2期111-121,共11页Applied Linguistics

基  金:国家社科基金项目“汉语篇章结构的特征—依存描写机制及资源建设研究”(17BYY036)的资助。

摘  要:小句识别是篇章信息处理的基础问题。在语言学上,判断一个语段是否为小句,不仅依赖其内部结构,也依赖其在对外全局中的功能。问题是,识别小句一般依赖多大范围语段全局为好。本文基于汉语小句识别,对此探索。汉语小句一般以标点标记首尾,但并非所有标点都标记小句。本文将小句识别当成标点分类问题,将小句识别所依赖的全局范围归结为标点前后的语段个数,探测该范围大小与识别效果间关系。本文基于预训练语言模型Bert提取标点两侧语段的文本特征进行小句识别。实验表明,语段个数增多,识别效果增强,标点前后语段各达到4个效果最好;对识别效果的贡献,标点前侧语段大于后侧语段,双侧语段大于单侧语段;通过全局长度与前后语段特征权重的优化,最优模型小句识别效果F1值为95.19%。Clause recognition is a basic issue in discourse information processing.In linguistics,whether a paragraph is a clause depends not only on its internal structure,but also on its function in the overall external situation.The question is the range of the paragraph that the clauses generally depend on.This paper explores this question based on Chinese clause recognition.Chinese clauses usually mark the beginning and end with punctuation,but not all punctuation marks clauses.In this paper,clause recognition is regarded as a punctuation classification problem.The global range relied on by clause recognition is reduced to the number of paragraphs before and after punctuation.The relationship between the size of this range and the recognition effect is detected.Based on the pre-training language model Bert,this paper extracts the text features of the segments on both sides of punctuation for clause recognition.The experiment shows that with the increase of the number of paragraphs,the recognition effect is enhanced,and the effect is the best when the number of paragraphs before and after punctuation reaches four respectively.The contribution to the recognition effect is that the front segment of punctuation is greater than the back segment,and the bilateral segment is greater than the unilateral segment.By optimizing the global length and the feature weight of the front and back paragraphs,the F1 value of the optimal model clause recognition effect is 95.19%.

关 键 词:小句识别 篇章分析 语段全局范围 中文信息处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象