检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杜月明 王亚敏 王蕾[2] DU Yueming;WANG Yamin;WANG Lei
机构地区:[1]北京大学对外汉语教育学院,北京100871 [2]北京语言大学汉语速成学院,北京100083
出 处:《语言文字应用》2022年第3期73-86,共14页Applied Linguistics
基 金:国家社会科学基金重大项目“面向全球孔子学院的中国概况教学创新研究及其数字课程建设”(18ZDA339)的资助。
摘 要:本文基于汉语二语文本可读性的特征集合,通过对比六种机器学习模型的效果,引入特征选择算法,实现了汉语水平考试(HSK)阅读文本可读性的自动评估。实验结果表明,支持向量机模型在HSK阅读文本可读性评估中的表现最好;基于汉字、词汇、句法和篇章的全特征模型的预测准确率达0.876;不同层面的特征预测能力存在差异,其中词汇层面表现最好;剔除冗余特征后,词汇和汉字两个层面的18个特征进入最优模型,句法和篇章特征未能进入该模型。本研究对HSK阅读文本的选择和改编及其他类型的文本可读性评估具有一定的参考意义。This paper proposed a set of features for CSL text readability assessment and then compared the effectiveness of six machine learning models in addition to employing the algorithms of feature selection to assess the readability of the Hanyu Shuiping Kaoshi(HSK)reading texts.The experiments demonstrated that the prediction of the support vector machine was significantly higher than others.The accuracy based on the full-featured model including Chinese characters,lexical,syntactic,and discourse reached 0.876 and there existed gaps at different linguistic levels,among which the lexical-level features were the most reliable.The optimal model consisted of 18 features at the lexical level and character level after eliminating the redundant features,while syntactic and discourse features were not in the model.This study has implications for the selection and adaptation of HSK reading texts and the readability evaluation of other types of texts.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7