检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《情报理论与实践》2018年第5期143-149,136,共8页Information Studies:Theory & Application
基 金:国家社会科学基金项目"多种类型文本数字资源自动分类研究"的成果之一;项目编号:15BTQ066
摘 要:为满足数字图书馆各种类型数字化资源统一分类组织的需要,文章着重对数字图书馆中多种类型文献混合分类的可行性进行探索与分析。引入语义主题模型构建方法,结合外部知识库Wikipedia进行语义扩展,构建一种基于主题语义扩展的混合类型文献自动分类方法。研究发现:在多类型文献混合分类中,网页与非学术性期刊文献、图书与学术性期刊文献之间具有较高的亲和力,可互相作为分类材料中的训练集并达到较高分类性能;不同分类算法针对多种类型文献混合分类具有不同的可学习能力和适应性,贝叶斯算法、最大熵模型比支持向量机更能适应多种类型文献的混合分类;引入主题语义扩展方法能够有效减弱不同类型文献之间的文本特征差异,增强不同类型文献混合分类时的亲和力,提高文献的混合分类性能。To meet the demand of unified classification and organization for various kinds of digital resources in digital library,this paper focuses on the feasibility of classification for the mix of multiple kinds of library collections. An automatic classification method based on topic semantic extension is presented by introducing the semantic topic models and Wikipedia. The research discoveries that library collections from websites are close to those from non-academic journals,books and academic literatures,and they can support each other by achieving good performance with one of them as training set. Different classification methods have different abilities on self-learning and adaptability for the mixed literatures. Naive Bayes and MaxEnt have better performances for the mixed literatures classification than the Support Vector Machine( SVM). Topic semantic expansion method can not only reduce the feature diversity of different collections effectively,but also increase the affinity of mixed classification,which improves the performance of mixed classification.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117