检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:石斌 王昊[1,2] 李晓敏 周抒 Shi Bin;Wang Hao;Li Xiaomin;and Zhou Shu(School of Information Management,Nanjing University,Nanjing 210023;Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023)
机构地区:[1]南京大学信息管理学院,南京210023 [2]江苏省数据工程与知识服务重点实验室,南京210023
出 处:《情报学报》2025年第2期220-233,共14页Journal of the China Society for Scientific and Technical Information
基 金:国家自然科学基金项目“关联数据驱动下我国非遗文本的语义解析与人文计算研究”(72074108);江苏省图书馆学会课题“江苏省公共文化服务适老化内容体系与目标路径研究”(22YB056)。
摘 要:随着科研工作者人数的不断增加,科技论文的发表数量呈现快速增长的趋势。面对海量的科技论文,文献的归档、录入和分析工作变得越发繁重。当前,针对文献的分类模型主要关注论文的内容信息,而忽略了论文相关的关联信息。为此,本文提出一种融合内容信息与学术网络的论文表征模型PAITKG (paper analysis by incorporating text and knowledge graph),引入知识图谱嵌入技术对文献的多重关联信息进行表征,采用Adapter微调的SciBERT提取内容特征,并将二者融合。在训练过程中,本文改进了动态对抗损失函数来引导模型更好地关注错误结果,并将该方法在数字人文和多模态学习两个领域的文献数据集上进行实验。在科技文献的学科多标签分类任务上,PAITKG比Baselines有显著改善,很好地提高了分类精度。除此以外,通过上游任务的学习,PAITKG的表征获得了更广泛的应用,在没有任何额外训练的情况下,本文模型提取的特征向量能够较好地应用于主题聚类、学者推荐等分析任务。研究结果表明,PAITKG通过构建并表征论文的学术网络,有效融合了文献的关联信息,提高了对文献数据的理解能力,而且其学习到的表征具有优秀的泛化潜力,能够应用于各种文献分析工作。With the increasing number of scientific research workers,the publication of scientific and technological papers published has increased rapidly,making the work of archiving,inputting,and analyzing documents increasingly burdensome.Most of the classification models focus on the content information of the paper,ignoring the relevant information.To solve this problem,this study proposes a paper representation model called PAITKG,which integrates content information and academic networks.Knowledge graph embedding technology is introduced to characterize multiple relationship patterns of literature;SciBERT,which is fine-tuned by Adapter,is used to extract content features and integrate the two.In the training process,this study improves the dynamic counter loss function to guide the model to pay more attention to error results.It applies this method to literature classification and analysis in the field of digital humanities.In the multilabel classification of scientific and technological literature,PAITKG showed significant improvement compared with the baselines,which greatly improved the classification accuracy.In addition,the representation of PAITKG has been more widely applied through the learning of upstream tasks.Without any additional training,the feature vectors extracted by the model can be applied to analysis tasks such as topic clustering and scholar recommendation.The experiments show that PAITKG can effectively integrate the associated literature information and improve the understanding of literature data by construct‐ing and characterizing the academic networks of papers.Moreover,the representations learned by PAITKG have excellent generalization potential and can be applied to various literature analysis work.
关 键 词:文献表征 语义表示 关联特征 知识图谱嵌入 RelaGraph
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.240.94