汉语水平考试(HSK)阅读文本可读性自动评估研究被引量：8

A Study on the Automatic Text Readability Assessment of Reading Texts in Hanyu Shuiping Kaoshi(HSK)

作　　者：杜月明王亚敏王蕾[2] DU Yueming;WANG Yamin;WANG Lei

机构地区：[1]北京大学对外汉语教育学院,北京100871 [2]北京语言大学汉语速成学院,北京100083

出　　处：《语言文字应用》2022年第3期73-86,共14页Applied Linguistics

基　　金：国家社会科学基金重大项目“面向全球孔子学院的中国概况教学创新研究及其数字课程建设”(18ZDA339)的资助。

摘　　要：本文基于汉语二语文本可读性的特征集合,通过对比六种机器学习模型的效果,引入特征选择算法,实现了汉语水平考试(HSK)阅读文本可读性的自动评估。实验结果表明,支持向量机模型在HSK阅读文本可读性评估中的表现最好;基于汉字、词汇、句法和篇章的全特征模型的预测准确率达0.876;不同层面的特征预测能力存在差异,其中词汇层面表现最好;剔除冗余特征后,词汇和汉字两个层面的18个特征进入最优模型,句法和篇章特征未能进入该模型。本研究对HSK阅读文本的选择和改编及其他类型的文本可读性评估具有一定的参考意义。This paper proposed a set of features for CSL text readability assessment and then compared the effectiveness of six machine learning models in addition to employing the algorithms of feature selection to assess the readability of the Hanyu Shuiping Kaoshi(HSK)reading texts.The experiments demonstrated that the prediction of the support vector machine was significantly higher than others.The accuracy based on the full-featured model including Chinese characters,lexical,syntactic,and discourse reached 0.876 and there existed gaps at different linguistic levels,among which the lexical-level features were the most reliable.The optimal model consisted of 18 features at the lexical level and character level after eliminating the redundant features,while syntactic and discourse features were not in the model.This study has implications for the selection and adaptation of HSK reading texts and the readability evaluation of other types of texts.

关键词：文本可读性 HSK阅读文本语言特征机器学习支持向量机

分类号：H087[语言文字—语言学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

汉语水平考试(HSK)阅读文本可读性自动评估研究被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

汉语水平考试(HSK)阅读文本可读性自动评估研究 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

汉语水平考试(HSK)阅读文本可读性自动评估研究被引量：8