国内自然语言处理领域数据集引用行为分析  被引量:1

Analysis of Dataset Citing Behaviors in the Field of Natural Language Processing in China

在线阅读下载全文

作  者:徐琳宏[1] 王凯达 张立杰[1] XU LinHong;WANG KaiDa;ZHANG LiJie(School of Software,Dalian University of Foreign Languages,Dalian 116000,P.R.China)

机构地区:[1]大连外国语大学软件学院,大连116000

出  处:《数字图书馆论坛》2023年第11期29-37,共9页Digital Library Forum

基  金:国家自然科学基金项目“面向社交媒体的多语种文本情感分析方法研究”(编号:61806038)资助。

摘  要:随着科学研究对数据的依赖性不断增强,分析国内自然语言处理领域内数据集的引用行为,有利于规范化数据集的构建和使用,推动国内自然语言处理领域的快速发展。选取《中文信息学报》2013—2022年的1628篇论文为样本,通过全文本分析法,人工标注1970条数据集引用信息,以研究文献对数据集的引用行为。研究发现:在国内自然语言处理领域研究中,引用他人数据集的论文数量逐渐增加,使用自建数据集的论文逐渐减少,并且引用数据集论文的篇均被引频次高于自建数据集论文;引用多个数据集的倾向较为明显,引用单个数据集的论文逐渐减少,并且引用2~3个数据集论文的篇均被引频次高于引用单个数据集的论文;数据集重用性较低,高被引数据集主要来源于评测。With the increasing dependence of scientific research on data,investigating the reference behavior of datasets in the field of natural language processing(NLP)in China is conducive to promoting the standardized construction and citation of datasets and the fast development of this field.This paper selects 1628 papers from the Journal of Chinese Information Processing from 2013 to 2022 as samples and the citation information of 1970 datasets is manually marked through full-text analysis to study the citation behavior of datasets in the literature.In the field of NLP research in China,the number of papers citing others’datasets is gradually increasing,while the number of papers using self-built datasets is decreasing.Furthermore,the average citation frequency of papers citing datasets is higher than that of papers using self-built datasets.There is a tendency to cite multiple datasets,and the number of papers citing a single dataset is decreasing.Moreover,the average citation frequency of papers citing 2 to 3 datasets is higher than that of papers citing a single dataset.Dataset reusability is relatively low,and highly cited datasets primarily come from evaluations.

关 键 词:数据集引用 数据引用 自然语言处理 高被引数据集 数据集重用 

分 类 号:G353.1[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象