领域双语数据增强的学术文本摘要结构识别研究  被引量:6

Structural Recognition of Abstracts of Academic Text Enhanced by Domain Bilingual Data

在线阅读下载全文

作  者:刘江峰 冯钰童 刘浏 沈思 王东波[1] Liu Jiangfeng;Feng Yutong;Liu Liu;Shen Si;Wang Dongbo(College of Information Management,Nanjing Agricultural University,Nanjing 210095,China;School of Economics&Management,Nanjing University of Science and Technology,Nanjing 210094,China)

机构地区:[1]南京农业大学信息管理学院,南京210095 [2]南京理工大学经济管理学院,南京210094

出  处:《数据分析与知识发现》2023年第8期105-118,共14页Data Analysis and Knowledge Discovery

基  金:国家自然科学基金项目(项目编号:71974094)的研究成果之一。

摘  要:【目的】准确把握社会科学学术文献的核心内容,提升文献摘要的语步结构识别效果。【方法】使用预训练语言模型在多种图书情报领域核心期刊的双语摘要数据上进行实验,提出一种分别在模型的预训练、微调、模型输出层使用领域数据进行增强学习的方法。【结果】充分利用领域双语数据进行增强预训练、微调以及融合双语句子分类概率能够在单期刊数据上将摘要结构识别的F1值提升约1~2、1、0.5~1个百分点。【局限】限于计算资源,未在跨语言预训练模型上进行领域数据的继续预训练并测试性能。【结论】研究充分利用学术文献中的双语资源,有效提升了摘要语步结构识别效果,对快速了解文献内容、促进科学交流具有一定意义。[Objective]This paper aims to grasp the core content of social science academic literature accurately and improve the structure recognition effect of literature abstracts.[Methods]An experiment was conducted on the bilingual abstract data of several core periodicals in the field of library and information science by using pre-training language model,and an enhanced learning method was proposed by using domain data in the stages of pre-training,fine-tuning and model's output layer.[Results]Enhancement pre-training,fine-tuning,and fusion of bilingual sentence classification probability could improve the F1 values of abstract structure recognition by 1 to 2,1,and 0.5 to 1 percentage point on single journal data,respectively.[Limitations]Due to limited computing resources,the field bilingual text continued pre-training and performance test were not conducted on the cross-language pre-training model.[Conclusions]This research makes full use of bilingual resources in academic literature and effectively improves the recognition effect of abstract structure,which is of certain significance to quickly understand the content of literature and promote scientific communication.

关 键 词:跨语言 数据增强 预训练模型 语步识别 概率融合 

分 类 号:G353[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象