学术论断句标注与识别方法探索  被引量:4

Recognition Method and Annotation of Academic Claim Sentences

在线阅读下载全文

作  者:徐健[1,2] 郭语凡 喻雪寒 黄雨馨 杨婷婷 王唯一[1] 刘政 Xu Jian;Guo Yufan;Yu Xuehan;Huang Yuxin;Yang Tingting;Wang Weiyi;Liu Zheng(College of Information Management,Nanjing Agricultural University,Nanjing 210095;The Post-Doctoral Research Center of Agricultural&Forestry Economics and Management,College of Economics and Management,Nanjing Agricultural University,Nanjing 210095)

机构地区:[1]南京农业大学信息管理学院,南京210095 [2]南京农业大学经济管理学院农林经济管理博士后流动站,南京210095

出  处:《情报学报》2022年第7期707-719,共13页Journal of the China Society for Scientific and Technical Information

基  金:国家社会科学基金项目“领域学术观点库构建理论与方法研究”(20CTQ025)。

摘  要:学术文本中的论断句包含了学者对研究问题的看法和判断,对其进行识别有助于组织和挖掘其中蕴含的学术观点,以辅助学者更高效地开展科研活动。在对前人研究进行归纳的基础上,提出论断句判断的3个充分条件和3个必要条件,从肯定和否定角度构建论断句判定标准。开发论断句标注系统,选择信息资源管理领域部分论文,开展摘要和全文层面论断句的标注实验。评测最小序列优化、支持向量机、朴素贝叶斯、决策树、k近邻、BERT(bidirectional encoder representations from transformers)+FC(full connection)、BERT+BiLSTM(bidirectional long short-term memory)分类器对论断句的识别效果。研究发现:①使用本文提出的判断标准,标注者在摘要和全文层面对学术文本中论断句和非论断句的标注一致性较高;②仅使用文本特征情况下,BERT+BiLSTM算法识别效果最好,准确率、召回率和F_1值等指标均大于90%;③论断句和非论断句在长度、段内位置、文内位置和TextRank权重上频率分布均存在差异;④在摘要层面,使用序列最小优化算法,加入长度特征后,分类器识别效果提升0.5%;在全文层面,使用支持向量机分类器,加入长度、段内相对位置、文内相对位置特征后,分类器识别效果在F_1值上取得了2%的提升。Claim sentences in academic texts contain the scholars’ opinions and judgments on research issues. Identifying them is helpful for organizing and mining academic thoughts contained in them to assist scholars to carry out scientific research activities efficiently. Based on previous studies, this paper presents three sufficient conditions and three prerequisite conditions for the judgment of claim sentences and clarifies the judgment criteria of claim sentences from positive and negative perspectives. In this study, we construct the annotation system of claim sentences, select a few papers in the field of information resource management, and carry out the annotation experiment of claim sentences at the abstract and full text levels. The recognition effect of sequential minimal optimization(SMO), support vector machine(SVM), naive Bayesian,decision tree, k-nearest neighbor(k NN), BERT+FC, and BERT+BiLSTM classifiers on a claim sentence was evaluated.The results show that:(1) using the criteria proposed in this study, the annotators have a high consistency in the annotation process of claim and non-claim sentences within academic texts at the abstract and full text levels.(2) When only textual features are used, the method based on the BERT+BiLSTM achieves the best performance. Evaluation shows that the precision, recall, and F_1 indicators are greater than 90%.(3) In academic papers, there exist differences in the length and the relative position within a paragraph and a text, between claim and non-claim sentences.(4) At the abstract level, the SMO method was used. After incorporating the length feature, the recognition effect of the classifier was improved by 0.5% in the F_1 value. At the full-text level, we used the SVM classifier. After adding the features of length and the relative position within the paragraph and text, the recognition effect of the classifier was improved by 2% in the F_1 value.

关 键 词:序列最小优化算法 朴素贝叶斯 支持向量机分类器 信息资源管理 决策树 序列优化 学术文本 位置特征 

分 类 号:G353.1[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象