检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李湘东[1,2] 刘康[1] 丁丛[1] 高凡[1]
机构地区:[1]武汉大学信息管理学院,武汉430072 [2]武汉大学信息资源研究中心,武汉430072
出 处:《现代图书情报技术》2016年第2期59-66,共8页New Technology of Library and Information Service
基 金:国家社会科学基金项目"多种类型文本数字资源自动分类研究"(项目编号:15BTQ066)的研究成果之一
摘 要:【目的】解决由于不同类型文献而产生的特征不匹配等问题,提高待分类文本的分类效果。【方法】使用与待分类文本属于不同文献类型的文本作为语料库的训练集,引入第三方资源《知网》进行语义特征扩展。【结果】利用该方法在网页、图书、非学术性期刊、学术性期刊4种类型文献上进行分类实验,与未经过扩展的分类方法相比,分类准确率提高1.2%至11.0%。【局限】未对每一种文献类型都使用公开语料进行测试,因此本文方法的通用性和实验结果的客观性有待进一步检验。【结论】实验结果表明,该方法具有一定的可行性和实用性,在不同程度上可以消除不同类型文献之间的语义差异,从语料库构建和特征扩展两个途径提高文本自动分类的分类效果。[Objective] This paper aims to solve the feature mismatch problem caused by different document types and improve the performance of automatic classification technology. [Methods] We proposed a new method to extend the semantic features using documents of various types as the corpus, which were introduced the third-party resource How Net and were different with the other un-categorized ones. [Results] Compared with the non-feature-extension classification method, the proposed method increased the F-measure by 1.2% to 11.0% in our classification experiment. Four document types, used in our study included webpages, books, non-academic periodicals and academic journals. [Limitations] Not every type of document was tested with the publicly accessible corpus, thus, more tests were needed to examine the generalization and objectiveness of the new method. [Conclusions] Our study showed that the proposed method was feasible. It could effectively eliminate the semantic differences among various types of collections and improve the performance of automatic text classification through corpus construction and feature extension.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3