基于GPT-4的英语写作自动化评估探索——以雅思写作任务2为例  

Exploration of automated L2 writing evaluation based on GPT-4--Taking IELTS Writing Task 2 as an example

在线阅读下载全文

作  者:董艳云[1] 祁昕阳 马晓梅[1] DONG Yanyun;QI Xinyang;MA Xiaomei

机构地区:[1]西安交通大学

出  处:《语言测试与评价》2024年第2期13-30,共18页Language Testing and Assessment

摘  要:本研究旨在探索GPT-4用于小样本二语写作的评估能力,以雅思写作任务2为例,设计了包含六类指令的指令工程,通过数据分布、相关分析及一致性检验,逐步分析了GPT-4在不同指令窗口下的评分性能在实验集上的表现。结果发现:第一,“最简+标准+样例”指令为最佳,并在验证集上再次得到验证。在最佳指令下,GPT-4的评分与考官评分一致性较强,且具备强相关关系。第二,考官评价与评分标准和校标样例存在信息偏差,不宜作为指令资料,否则可能会对GPT-4形成干扰。本研究期望能为GPT-4在教育环境中的写作评估应用提供实证支持,为进一步探索其在课堂环境中的实施提供基础。This study aims to explore the assessment capability of GPT-4 for small-sample L2 writing.Taking IELTS Writing Task 2 as an example,this research employs“prompt engineering”strategy and designs 6 distinct prompts.By examining data distribution,interrater correlation,and inter-rater agreement,this study analyzes the scoring performance of GPT-4 under different prompt windows.It is found that the“minimal+criteria+examples”prompt yields the best results,which is further verified on the test set.Under the optimal prompt,GPT-4’s scoring shows strong consistency with the examiner’s scores and exhibits a strong correlation.Additionally,an information discrepancy was found between the examiner’s comments and the scoring criteria and calibration examples.The examiner’s comments would potentially undermine GPT-4’s assessment capabilities,so it is not recommended to include them into the prompts.This study aspires to contribute empirical insights into the practical application of GPT-4 for writing evaluation in educational settings,offering a foundation for further exploration and implementation in classroom contexts.

关 键 词:GPT-4 雅思写作任务2 自动化作文评分 评分员一致性 

分 类 号:G63[文化科学—教育学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象