引书的自动识别及文献计量学分析  被引量:16

Automatic Recognition and Bibliometric Analysis of Cited Books

在线阅读下载全文

作  者:黄水清[1,2] 周好 彭秋茹[1,2] 王东波[1,2] Huang Shuiqing;Zhou Hao;Peng Qiuru;Wang Dongbo(College of Information Science and Technology,Nanjing Agricultural University,Nanjing 210095;Research Center for Correlation of Domain Knowledge,Nanjing Agricultural University,Nanjing 210095)

机构地区:[1]南京农业大学信息科学技术学院,南京210095 [2]南京农业大学领域知识关联研究中心,南京210095

出  处:《情报学报》2021年第12期1325-1337,共13页Journal of the China Society for Scientific and Technical Information

基  金:国家社会科学基金重大项目“基于《汉学引得丛刊》的典籍知识库构建及人文计算研究”(15ZDB127)。

摘  要:古籍文本中存在大量事实上的引文条目,即引书。目前,引文分析法多基于现代文本开展研究,学术界对古籍文本中的引用现象的关注较少。本文将引文分析法应用于古籍文本,计算和分析引书的引文指标,为古籍文本的引书计量学研究建立初步框架。本文选择《十三经注疏》中的《论语注疏》《毛诗正义》《春秋左传正义》三部典籍为研究对象,分别基于CRF(conditional random field)模型、Bi-LSTM(bidirectional long short-term memory)模型以及Bi-LSTM-CRF模型,从古籍文本中自动识别引书条目,并对比抽取性能,利用引文分析方法计算并分析了三部典籍中引书的各项引文计量指标,进而分析古籍文本之间的知识关联,探讨古人的引用行为。研究结果表明,机器学习模型应用于引书条目的自动识别整体效果良好,两种深度学习模型表现更佳,CRF模型存在明显差距。在两种深度学习模型中,Bi-LSTM-CRF模型性能略好。古籍文本之间的关联强度不一,引书的规模受多方因素影响,经部文献的被引次数占比最高,经部文献中的礼制类文献尤甚。此外,古人的引用行为也受成书目的、学者知识背景、引书文献获取难易程度等多重因素的影响。There are several citations of ancient books,which are called cited books.Present citation analysis focuses mostly on modern texts;the academic community pays less attention to the citation phenomenon in the texts of ancient books.In this paper,we apply the citation analysis method to ancient books and calculate and analyze the citation indica‐tors of cited books in order to establish a preliminary framework for the bibliometrics research of cited books.This article takes Lunyu Zhushu,Maoshi Zhengyi,and Chunqiu Zuozhuan Zhengyi in Notes of Thirteen Classics as the sample.First,ci‐tation items from ancient books are automatically recognized based on CRF(conditional random field),Bi-LSTM(bidirec‐tional long short-term memory)and Bi-LSTM-CRF models and compared their extracted features.Then,the citation analy‐sis method is used to calculate and analyze various citation measurement indexes of these three classic books in order to ex‐amine the knowledge correlation between ancient books and discuss the citation behavior of ancient scholars.The results show that the machine learning model applied to the automatic recognition of citation items has a good overall effect,the two deep learning models perform better,and there is an obvious gap between CRF models.Among the two deep learning models,the Bi-LSTM-CRF model is slightly better than the Bi-LSTM.The scale of cited books is affected by various fac‐tors,and the cited times of classic documents account for the highest proportion,especially the ritual documents in classic documents.In addition,the ancient people’s citation behavior was influenced by multiple factors such as the purpose of the book,the scholars’knowledge background,and the difficulty of obtaining the cited documents.

关 键 词:引书 CRF LSTM 引文分析 引用行为 

分 类 号:G353.1[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象