基于语义的多层式图书自动分类实证研究  被引量:1

An Empirical Study of Multi-Layered Automatic Book Classification Based on Semantics

在线阅读下载全文

作  者:高斌 马菊红 顾婷 Gao Bin;Ma Juhong;Gu Ting

机构地区:[1]江苏科技大学图书馆

出  处:《图书馆学研究》2024年第8期62-76,共15页Research on Library Science

基  金:2024年度江苏高校哲学社会科学研究一般项目“基于关联数据的MARC 21在网络环境下创新性研究”(项目编号:2024SJYB1620)的研究成果之一。

摘  要:为解决图书馆图书分类中出现的人工分类的一致性与分类效率问题,将多层式图书自动分类系统应用于图书分类工作,同时导入语义概念作为改进分类效果的策略,从而提高分类质量。针对分类过程中出现的数据量或文献特征量可能不足的问题,利用Word2Vec工具保留目标词与上下文之间的语义关系特征,将带有语义的词汇扩展为特征词汇,借此改善分类效果。将图书馆畅想之星中文电子书中得到的数据,使用4种分类器(朴素贝叶斯、支持向量机、决策树、K近邻算法)实际应用于多层式图书自动分类系统。在语义方面,使用Word2Vec训练语料,并建构类似索引典的同义词词典,再扩展特征词汇,最终以正确率评估分类效果。实验结果显示,多层式图书自动分类系统在图书馆分类方面具有很好的效果,其所提出的策略确实能够提升图书分类的准确度。To solve the problems of consistency and efficiency in manual classification,a multi-layered automatic book classification is applied to library cataloging work,and semantic concepts are introduced as a strategy to improve the classification effect,so as to improve the classification quality.In order to solve the problem of insufficient data and literature features,the proposed strategy uses Word2Vec,which can extract the deep semantic relationships between words and contexts,to expand words features for improving the classification performance.With the collection of data from Cxstar Ebook,Na6ve Bayes,SVM,Decision Tree C4.5,and KNN are applied to the multi-layered automatic book classification.Regarding the proposed semantic-based approach,this study uses Word2Vec as a tool for training corpus.First,a thesaurus is built by the training results,and next the word features of the data set for classification are expanded.Finally,the classification effect is evaluated based on the accuracy level.Experimental results show that the performance of the multi-layered automatic book classification outperformed the traditional automatic book classification in a library environment.The proposed strategy can indeed improve the accuracy of book classification.

关 键 词:分类号 多层式 图书自动分类 Word2Vec 

分 类 号:G254.1[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象