面向新闻评论的汉语反问句语料库构建  被引量:4

Corpus Construction of Chinese Rhetorical Questions Oriented to News Comment

在线阅读下载全文

作  者:李翔 朱晓旭[1] 刘承伟 LI Xiang;ZHU Xiaoxu;LIU Chengwei(School of Computer Science and Technology,Soochow University,Suzhou 215006,China)

机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215006

出  处:《山西大学学报(自然科学版)》2021年第3期403-410,共8页Journal of Shanxi University(Natural Science Edition)

基  金:国家自然科学基金(61836007,61772354,61773276);江苏高校优势学科建设工程资助项目。

摘  要:反问句作为汉语中常用的表达方式,具有丰富的感情色彩,如能对其进行正确地识别,将会改善情感分析等任务的结果。文章利用半监督学习和主动学习,提出了一个半自动的反问句语料收集方法,构建了面向新闻评论的汉语反问句语料库,语料库规模达到6 000余句。文章进一步分析了语料库的特点,利用句法路径特征、位置特征在多个模型上进行反问句识别实验。实验结果表明,利用文章构建的反问句语料库能够训练出具有较高性能的反问句识别模型,模型的精确率、召回率、F1值分别达到90.79%、93.57%和91.30%。同时实验结果验证了句法路径特征与位置特征在识别反问句上的有效性。Rhetorical questions, as a common expression in Chinese, are rich in emotional colors. If they can be correctly identified,the results of tasks such as sentiment analysis will be improved. Based on the ideas of semi-supervised learning and active learning,this paper designs a semiautomatic rhetorical question corpus collection method and constructs a Chinese rhetorical question corpus with a scale of more than 6 000 sentences. Moreover, the characteristics of the corpus and recognizes the rhetorical question on models are analyzed. The experimental results show that using the rhetorical question corpus constructed in this paper can train a highperformance rhetorical question recognition model. The accuracy, recall, and F1-measure of the model reach 90. 79%, 93. 57% and91. 30%, respectively. At the same time, the experimental results verify the effectiveness of syntactic path features and location features in identifying rhetorical questions.

关 键 词:反问句语料库 半监督学习 主动学习 句法路径特征 位置特征 

分 类 号:H08[语言文字—语言学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象