检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡昊天 邓三鸿[2,3] 孔玲[4,5] 闫晓慧 杨文霞 王东波 沈思[3,6] Hu Haotian;Deng Sanhong;Kong Ling;Yan Xiaohui;Yang Wenxia;Wang Dongbo;Shen Si(Jiangsu Academy of Agricultural Sciences,Nanjing 210014;School of Information Management,Nanjing University,Nanjing 210023;Key Laboratory of Data Engineering and Knowledge Services in Provincial Universities(Nanjing University),Nanjing 210023;School of Information Management,Shandong University of Technology,Zibo 255049;College of Information Management,Nanjing Agricultural University,Nanjing 210095;School of Economics&Management,Nanjing University of Science&Technology,Nanjing 210094)
机构地区:[1]江苏省农业科学院,南京210014 [2]南京大学信息管理学院,南京210023 [3]数据工程与知识服务省高校重点实验室(南京大学),南京210023 [4]山东理工大学信息管理学院,淄博255049 [5]南京农业大学信息管理学院,南京210095 [6]南京理工大学经济管理学院,南京210094
出 处:《情报学报》2024年第5期588-600,共13页Journal of the China Society for Scientific and Technical Information
基 金:国家社会科学基金重大项目“面向国家战略的情报学教育与发展研究”(20&ZD332);国家自然科学基金面上项目“基于深度学习的学术全文本知识图谱构建及检索研究”(71974094);南京大学中央高校基本科研业务费专项资金资助项目(0108-14370317)。
摘 要:情报学术语承载了情报学科基础知识与核心概念。从概念维度梳理与分析情报学术语对推动学科发展、助力下游知识挖掘任务具有重要意义。面对数量快速增长的科技文献,自动术语抽取替代了人工筛选,但现有方法严重依赖大规模标注数据集,难以迁移至低资源场景。本文设计了一种生成式情报学术语抽取方法(generative term extraction for information science,GTX-IS),将传统基于序列标注的抽取式任务转化为序列到序列的生成式任务。结合小样本学习策略与有监督微调,提升面向特定任务的文本生成能力,能够在低资源有标签数据集场景下较为精准地抽取情报学术语。对于抽取结果,本文进一步开展了情报学领域术语发现及多维知识挖掘。综合运用全文科学计量与信息计量方法,从术语自身、术语间关联、时间信息等维度,对术语的出现频次、生命周期、共现信息等进行统计分析与知识挖掘。采用社会网络分析方法,结合时间维度特征,从术语角度出发,完善期刊的动态简介,探究情报学研究热点、演变历程和未来发展趋势。本文方法在术语抽取实验中的表现超越了全部13种主流生成式和抽取式模型,展现出较强的小样本学习能力,为领域信息抽取提供了新的思路。Information science terminology conveys the basic knowledge and core concepts of information science disci-pline.It is thus of great significance to sort out and analyze information science terms from the basic concepts to promote the development of the discipline and assist downstream knowledge mining tasks.With the rapidly growing amount of sci-entific and technological literature,automatic term extraction has replaced manual screening,but existing methods rely heavily on large-scale labeled datasets,making it difficult to migrate to low-resource scenarios.This study designs a Gener-ative Term eXtraction for Information Science(GTX-IS)method,which transforms the traditional extraction task based on sequence labeling into a sequence-to-sequence generative task.Combined with few-shot learning strategies and supervised fine-tuning,it improves the ability to generate text for specific tasks and can more accurately extract information science terms in low-resource scenarios.For the extraction results,this study further develops term discovery and multi-dimension-al knowledge mining in the field of information science,and comprehensively uses full-text informetric and scientometric methods to conduct statistical analysis and knowledge mining on the frequency of occurrence,life cycle,and co-occur-rence information of terms from the dimensions of the term itself,the relationship between terms,and time information.Using the social network analysis method,combined with the characteristics of the time dimension,this study improves the dynamic profile of journals,facilitating the exploration of the research hotspots,evolution process,and future development trends of information science.The proposed method surpasses all 13 baseline generative and extractive models,showing a strong few-shot learning ability,and provides a new idea for domain information extraction.
关 键 词:情报学术语 术语自动抽取 文本生成 科学计量 热点分析
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.135.194.164