基于小波分析的电子文献分类

Electronic Document Classification Based on Wavelet Analysis

机构地区：[1]山东大学图书馆,济南250100 [2]南方医科大学图书馆,广州510515

出　　处：《情报学报》2013年第9期1000-1008,共9页Journal of the China Society for Scientific and Technical Information

摘　　要：文献数据的自动化分类，将在数字图书馆中占据越来越重要的地位。一般采用基于支持向量机的核方法，在标准测试集合上进行文献数据分类，具有某些不足。该方法存在文献向量规模庞大、核函数非正交且多义、重现率计算耗时等缺陷；不使用数字图书馆的真实数据测试，算法的实际说服力不强。为了解决这些问题，采用词汇扩展对文献向量进行预处理，得到少而精、正交无歧义的新文献向量；对文献向量按照语义排序，提高访问和计算速度；借助小波核将文献映射到L2空间进行文献分类。采用中国学术期刊网的真实分类数据，从摘要信息和全文文献两个角度进行验证，结果表明该方法优于核方法，具有一定的理论研究和实际应用价值。The automatic document classification will play an important role in digital library（DL）. The common methods classify the standard test collections with the kernel method based on support vector machine （ SVM）. There are some drawbacks in this method, such as the large-scale document vectors, non-orthogonal and polysemous kernel function, time-consuming of calculating re-occurrence, low authority derived from not using real DL data. To solve these problems, term expansion is used to generate fewer but better, orthogonal and unambiguous document vectors. These new document vectors are carried out semantic ordering. The wavelet kernel is used to map the documents onto L2 space for classification. The real classification records in China National Knowledge Internet（CNKI） are used to validate this method in aspects of abstract and fulhext. From the experimental results, it can be seen that our method is better than kernel method.

关键词：电子文献分类机器学习支持向量机 L2空间小波分析

分类号：G254[文化科学—图书馆学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于小波分析的电子文献分类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于小波分析的电子文献分类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索