融合维基知识的变分半监督百度百科分类  

VARIATIONAL SEMI-SUPERVISED BAIDU ENCYCLOPEDIA CLASSIFICATIONBASED ON WIKI KNOWLEDGE

在线阅读下载全文

作  者:韩佩甫 余正涛[1,2] 郭军军 高盛祥[1,2] 赖华[1,2] Han Peifu;Yu Zhengtao;Guo Junjun;Gao Shengxiang;Lai Hua(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,Yunnan,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500 [2]昆明理工大学云南省人工智能重点实验室,云南昆明650500

出  处:《计算机应用与软件》2024年第7期128-135,144,共9页Computer Applications and Software

基  金:国家自然科学基金项目(61972186,61762056,61472168);云南省重大科技专项计划项目(202002AD080001);云南省高新技术产业专项(201606);云南省应用基础研究计划重点项目(2019FA023)。

摘  要:跨语言知识图谱构架多利用维基百科,但其中文实体较少,构建大规模以中文为核心的跨语言知识图谱比较困难。如何利用百度百科等现有的大规模中文百科知识库来辅助构建跨语言知识图谱是亟待解决的问题,然而维基百科和百度百科属于不同的分类体系,增加了跨百科检索的范围和难度。基于此,提出一种融合少量带分类标签的维基知识指导下的半监督百度百科分类方法。基于词嵌入和词袋模型分别获得百科摘要文本的语义特征和统计特征;融合两者作为变分自编码模型的输入,获得其语义表征;利用少量维基百科分类损失和海量无标签百度百科重构损失,构造半监督分类损失,实现分类体系统一。实验结果表明,所提方法能够准确实现百度百科到维基百科分类体系的迁移。The framework of cross-language knowledge graph is mostly made use of Wikipedia,but with few Chinese entities,it is difficult to build a large-scale cross-language knowledge graph with Chinese as the core.How to use the existing large-scale Chinese encyclopedia knowledge base such as Baidu Encyclopedia to assist the construction of cross-language knowledge map is an urgent problem to be solved.However,Wikipedia and Baidu Encyclopedia belong to different classification systems,which increases the scope and difficulty of cross-encyclopedia retrieval.On this basis,a semi-supervised Baidu Encyclopedia classification method is proposed,which integrates a small amount of Wikipedia knowledge with classification labels.The semantic features and statistical features of the encyclopedia abstract text were obtained based on the word embedding and BoW model.The two were fused as the input of the variational autoencoder to obtain the semantic representation of the encyclopedia text.A small amount of Wikipedia classification loss and a large amount of unlabeled Baidu Encyclopedia reconstruction loss were used to construct semi-supervised classification loss and realize the unification of classification system.Experimental results show that the proposed method can achieve the accurate migration from Baidu Encyclopedia to Wikipedia classification system.

关 键 词:分类体系 文本分类 半监督 词袋模型 变分自编码 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象