语言模型辅助的英语科技论文摘要语步语料库构建研究  

Research on Language Model⁃assisted Construction of Corpus for Move Structures in Abstracts of English Scientific Articles

在线阅读下载全文

作  者:李洪政 王若锦 刘芳[1,2] 冯冲 Li Hong-zheng;Wang Ruo-jin;Liu Fang;Feng Chong(School of Foreign Languages,Beijing Institute of Technology,Beijing 102488;Key Laboratory of Language,Cognition and Computation Ministry of Industry and Information Technology,Beijing 102488;School of Computer Science,Beijing Institute of Technology,Beijing 100081,China)

机构地区:[1]北京理工大学外国语学院,北京102488 [2]语言工程与认知计算工信部重点实验室,北京102488 [3]北京理工大学计算机学院,北京100081

出  处:《外语学刊》2025年第1期29-38,共10页Foreign Language Research

基  金:国家社科基金一般项目“基于深度学习技术的英文科技论文写作智能评测系统构建研究”(23BYY166);北京理工大学青年教师学术启动计划“稀缺类型语言资源建设及机器翻译研究”的阶段性成果。

摘  要:语步结构是学术论文中的文本语篇单位,在学术用途英语等方面具有重要价值。尽管关于学术论文的语步研究非常丰富,但语步标注数据资源仍然相对较少。本研究借助自然语言处理领域的语言模型构建了涵盖多个学科领域的英语科技论文摘要语步标注语料库,包括近3.4万个语步结构。语料库构建的第一阶段依靠专家标注形成高质量语料,在第二阶段也是主要阶段,采用基于BERT架构的自动标注模型,在保证标注质量的同时能够快速提升标注速度、扩大标注规模。本研究随后开展了摘要语步自动标注识别实验,对比自动标注模型与大语言模型ChatGPT和Claude3识别不同学科领域的语步结构的效果,验证了模型和语料库的价值。该研究能为科技论文写作智能批改等自然语言处理任务以及学术用途英语等外语教学与研究等提供必要的数据资源,也验证了大语言模型辅助构建语言资源的可能性,体现了语言智能驱动的智慧外语教育的重要性,能有效推动外语教育数字化转型。Move structures are discourse units in research articles(RA)and are of great value in English for Academic Purposes.Although there is abundant research on move structures in academic articles,there are still relatively few move annotation data resources.Based on Natural Language Processing(NLP)technologies,this research constructed a corpus for annotating move structures in English RA abstracts,and nearly 34,000 move structures from multi⁃disciplines were annotated.The first stage of corpus construction relied on manual expert annotation to form high⁃quality corpus data.In the second and main stage,a BERTbased automatic annotation model was adopted to improve the annotation speed and expand the annotation scale while ensuring the annotation quality.We then conducted move structure recognition experiments and compared the performance of our mo⁃del with large language models(LLM)including ChatGPT and Claude3,indicating the effectiveness of the proposed model.This research can provide necessary data resources for NLP related tasks such as intelligent assistance of English scientific articles writing.It is beneficial to foreign language teaching and research such as English for Academic Purposes,and verifies the possibility of LLM to assist in the construction of language resources.It also shows the importance of intelligent foreign language education empowered by language intelligence and can effectively promote the digital transformation of foreign language education.

关 键 词:语步结构 语料库 摘要文本 大语言模型 

分 类 号:H08[语言文字—语言学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象