基于BERT的领域分词优化高校图书馆借阅热点分析

An Analysis of Borrowing Hot Spots in University Libraries Based on BERT for Domain Segmentation Optimization

作　　者：陈金传[1] 成志强熊泽泉[1] 于亚秀[1] CHEN Jinzhuan;CHENG Zhiqiang;XIONG Zequan;YU Yaxiu(Library of East China Normal University,Shanghai 200241,China)

机构地区：[1]华东师范大学图书馆,上海200241

出　　处：《情报科学》2024年第11期76-83,111,共9页Information Science

基　　金：国家社会科学基金项目“面向学科交叉融合的信息资源服务创新体系研究”(23BTQ084)

摘　　要：【目的/意义】图书馆借阅数据的变化反映了当年借阅者关注重点的变化,一定程度上能够体现整个社会的研究关注热点。本文通过大语言模型建立高校图书馆图书借阅预约数据各字段与社会热点之间的关系模型,探索借阅数据与社会热点之间的关系,辅助实现对一段时间内社会热点的分析。【方法/过程】首先,采用编码—解码的结构构建关于图书题名的分词模型,利用大型的分词数据集进行训练,获取原始词频,然后根据字段中的读者院系和索书号进行领域匹配,最后,从借阅次数、预约持续时间和所属领域三个角度对原始词频进行权重更新,得到最终的与社会热点有关的热点词云。【结果/结论】本文首先对分词模型进行了实验,实验表明本文算法在MSR、PKU、CTB6三个数据集上F值明显优于其他算法,其中,在CTB6分词数据集上,本文算法F值达到97.18,高于CRF算法3.15个百分点,加入领域优化后的分词算法在专业性较强的文本上分词的性能更好。然后本文对图书馆借阅数据和预约数据进行了实验分析,展现了基于领域分词优化的热点词云生成框架的先进性,实验表明本文算法生成的热点词与社会热点能建立一定联系。【创新/局限】本文研究了图书借阅数据和预约数据的字段特点,创新性地提出了基于BERT的领域分词优化借阅热点生成框架。虽然本文利用了图书馆的数据字段特性构建了热点词云生成框架并且优化了词云生成结果,但是对于热点词云生成的性能没有一个量化的指标,接下来需要进行更多的探索和研究。【Purpose/significance】The changes in library borrowing data reflect the key concerns of borrowers at that time,and to a certain extent,can reflect the research hotspots of the entire society.This article aims to establish a relationship model between various fields of book borrowing reservation data in university libraries and social hotspots through a large language model,explore the relationship between borrowing data and social hotspots,and assist in the analysis of social hotspots over a period of time.【Method/process】Firstly,a word segmentation model for book titles is constructed using an encoding decoding structure.A large word segmentation dataset is used for training to obtain the original word frequency.Then,domain matching is performed based on the reader's department and call number in the field.Finally,the weight of the original word frequency is updated from three perspectives:borrowing frequency,reservation duration,and domain,to obtain the final hot word cloud related to social hotspots.【Result/conclusion】This paper first conducted experiments on the segmentation model,and the experiments showed that the algorithm in this paper had a significantly better F-value than other algorithms on the MSR,PKU,and CTB6 datasets.Among them,on the CTB6 segmentation dataset,the F-value of the algorithm in this paper reached 97.18,which is 3.15 percentage points higher than the CRF algorithm.The segmentation algorithm with domain optimization performed better on texts with strong professionalism.Then this paper makes an experimental analysis of library borrowing data and reservation data,and shows the progressiveness of the hot word cloud generation framework based on domain segmentation optimization.The experiment shows that the hot word generated by the algorithm in this paper can establish a certain relationship with social hot spots【.Innovation/limitation】This article studies the field characteristics of book borrowing data and reservation data,and innovatively proposes a borrowing hotspot gen

关键词：分词热点研究词云人工智能图书馆

分类号：G251[文化科学—图书馆学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于BERT的领域分词优化高校图书馆借阅热点分析

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于BERT的领域分词优化高校图书馆借阅热点分析

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索