无监督引用文本自动识别与分析  被引量:5

Identifying Citation Texts with Unsupervised Method

在线阅读下载全文

作  者:金贤日 欧石燕[1] Hyonil Kim;Ou Shiyan(School of Information Management,Nanjing University,Nanjing 210023,China)

机构地区:[1]南京大学信息管理学院,南京210023

出  处:《数据分析与知识发现》2021年第1期66-77,共12页Data Analysis and Knowledge Discovery

基  金:国家社会科学基金重点项目(项目编号:17ATQ001)的研究成果之一。

摘  要:【目的】探索施引文献中引用文本自动识别方法,并比较不同类型引用句在内容上的差别。【方法】提出一种无监督引用文本识别方法,通过比较候选句与施引文献和被引文献的文本相似度确定隐性引用句。为了精确计算文本相似度,提出向量空间模型与词嵌入模型相结合的两种文档向量模型。【结果】分别对两篇高被引论文约200篇施引文献中的隐性引用句进行了识别,本文方法的F值均达到92%以上。通过对显性引用句和隐性引用句的内容进行比较,发现两者在引用功能和情感上有明显区别:表达研究背景和技术基础的隐性引用句比例要高于显性引用句,而表达研究基础和研究比较的隐性引用句比例要低于显性引用句;45.3%的显性引用句为正面引用,而78.8%的隐性引用句为中性引用。【局限】仅对句子层面的引用文本进行识别,在短语层面的引用文本识别还有待于进一步探索。【结论】在识别引用文本时有必要识别隐性引用句,本文提出的引用文本识别方法性能较高。[Objective] This paper proposes a method to automatically identify citation texts and compare the contents of citation sentences. [Methods] We developed an unsupervised method to find the implicit citation sentences and then compared the similarity of these sentences and the citing/cited papers. We combined the vector space and the word embedding models to calcuate the similarity precisely. [Results] We identified the implicit citation sentences of two higly-cited papers from 200 citing articles and found the proposed method’s F-value was above 92%. By comparing the contents of the explicit and implicit citaiton senstences, we noticed their significant difference in citation functions and sentiments. There were more implicit citation sentences for research background and technical basis than the explicit ones. There were also fewer implicit citation sentences for research basis and comparison than the explicit ones. 45.3% of the explicit citation sentences were positive references while 78.8% of implicit citation sentences were neutral. [Limitations] We only investigated citation texts at sentence level. More research is needed to discuss the clause and phrase-level identifications.[Conclusions] The proposed method could effectively identify implicit citation sentences.

关 键 词:引用文本识别 隐性引用句 引用内容分析 

分 类 号:TP393[自动化与计算机技术—计算机应用技术] G250[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象