大语言模型在汉语写作智能评估中的应用研究  被引量:3

Practical Exploration of Large Language Models in Chinese Automated Essay Evaluation

在线阅读下载全文

作  者:薛嗣媛 周建设 XUE Siyuan;ZHOU Jianshe(Institute of Linguistics,Chinese Academy of Social Science,Beijing,China 100005;Research Center for Language Intelligence of China,Capital Normal University,Beijing,China 100089)

机构地区:[1]中国社会科学院语言研究所,北京100005 [2]首都师范大学中国语言智能研究中心,北京100089

出  处:《昆明学院学报》2024年第2期10-22,共13页Journal of Kunming University

基  金:国家语委“十四五”科研全球中文学习联盟专项“人工智能技术赋能中文学习研究——中文篇章逻辑结构表征和智能评估”(YB145-16);国家语委重点项目“中文表达能力智能评测理论与关键技术研究”(ZDI145-92);“十四五”科研规划课题“面向义务教育阶段的中文修辞能力智能识别研究”(YB145-56);中国教育技术协会重大项目“中文表达能力(CEA)标准研制及其智能测评应用创新研究”(XJJ202205003)。

摘  要:研究旨在评估大语言模型在写作自动评分、智能评语生成两个典型写作智能评估任务中的性能。研究以汉语二语学习者为研究对象,采用了3种不同提示策略验证大语言模型在写作自动评分和自动评语反馈方面的有效性,包括标准提示、思维链提示以及自洽思维链提示。结果显示,尽管大语言模型在写作自动评分任务中表现出一定的潜力,其稳定性和可靠性仍有待提高,但通过不断优化这些提示策略,可以显著增强模型处理写作评分和评语生成的能力。此外,这3种提示语会产生不同的效果,以提示的方式评估大语言模型的性能表现存在主观性,还不能完全替代教师独立开展评估测试,但现阶段可以作为辅助工具提高教师评估作文的效率。本研究的发现为大语言模型在汉语写作智能评估领域的应用提供了有力支持,为未来开发更高效、更精准的汉语写作智能评估系统提供参考。This study aims to evaluate the performance of large language models in two typical writing intelligent assessment tasks:automatic writing scoring and intelligent commentary generation.Focusing on Chinese as a second language learners,this research employed three different prompting strategies to verify the effectiveness of large language models in automatic writing scoring and automated feedback generation,including standard prompts,thought chain prompts,and self-consistent thought chain prompts.The results show that although large language models demonstrate potential in the automatic writing scoring task,their stability and reliability still need improvement.However,by continuously optimizing these prompting strategies,the capability of the model to handle writing scoring and commentary generation can be significantly enhanced.Moreover,different prompts yield different effects,and assessing the performance of large language models with prompts involves subjectivity.Thus,they cannot fully replace teachers independent assessment tests but can serve as auxiliary tools to improve the efficiency of teachers assessment of compositions at this stage.The findings of this study provide strong support for the application of large language models in the field of intelligent assessment of Chinese writing,emphasizing their potential value in enhancing the performance of assessment systems.This serves as a reference for developing more efficient and accurate intelligent assessment systems for Chinese writing in the future.

关 键 词:写作智能评估 自动作文评分 智能评语生成 大语言模型 ChatGLM 

分 类 号:H102[语言文字—汉语]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象