国产大语言模型的语文作文评价能力测试  

Chinese Composition Evaluation Ability Test of Domestic Large Language Model

在线阅读下载全文

作  者:魏顺平 张悦 冉柔 WEI Shun-Ping;ZHANG Yue;RAN Rou(School of Education,Minzu University of China,Beijing,China 100081;Engineering Research Center of Integration and Application of Digital Learning Technology,Ministry of Education,Beijing,China 100039)

机构地区:[1]中央民族大学教育学院,北京100081 [2]数字化学习技术集成与应用教育部工程研究中心,北京100039

出  处:《现代教育技术》2025年第3期24-33,共10页Modern Educational Technology

基  金:数字化学习技术集成与应用教育部工程研究中心2024年创新基金项目“面向人工智能的终身教育领域高质量数据资源治理与应用研究”(项目编号:1441001)的阶段性研究成果。

摘  要:大语言模型作为人工智能的最新技术成果,将对数智时代的教育样态产生深刻影响。为调查大语言模型的作文评价能力,文章选取500篇小学语文作文,设计了37条提示语,以“智谱AI”“讯飞星火”这两款国产大语言模型为测试工具,从评分和评语两个方面进行评价,发现:在评分的可用性上,国产大语言模型的评分与原始分数具有微弱相关关系;在评分的稳定性上,国产大语言模型前后两次评分的相关度低、稳定性较差,而前后两次评级的相关度高、稳定性较好;在评语的准确率方面,国产大语言模型在内容选择、篇章结构方面的作文评语准确率较高;在评语的稳定性方面,国产大语言模型的评语具有生成性,前后两次生成的评语相似度低。最后,文章针对大语言模型的语文教育应用提出建议,以帮助教师更好地进行人机协同教学。As the latest technological achievement of artificial intelligence,the large language model will have a profound impact on the education pattern in the digital intelligence age.In order to investigate the composition evaluation ability of the large language model,this paper selected 500 elementary school language compositions,designed 37 prompts,and adopted two domestic large language models of“Zhipu AI”and“Xunfei Xinghuo”,as the test tools to evaluate the compositions from two aspects of scoring and comments.It was found that in terms of the usability of scoring,the scoring of the domestic large language models had a weak correlation with the original score;in terms of the stability of scoring,the correlation between the two scores of the domestic large language model before and after the scoring was low and the stability was poor,while the correlation between the two ratings before and after the scoring was higher and the stability was better;in terms of the accuracy of the comments,the domestic large language model had higher accuracy in the composition comments of the content selection and the text structure;in terms of the stability of the comments,the comments of the domestic large language model were generative,and the similarity between the twice generated comments before and after was low.Finally,this paper put forward some suggestions for the application of large language model in Chinese language education to help teachers better conduct human-computer collaborative teaching.

关 键 词:大语言模型 小学语文 作文评价 人机协同 

分 类 号:G40-057[文化科学—教育学原理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象