基于知识图谱的蒙汉双语五畜主题问答研究  

Research on Mongolian-Chinese Bilingual Q&A System for the Theme of the Five Livestock Based on Knowledge Graph

在线阅读下载全文

作  者:孙美名 布音其其格[2] 哈斯 SUN Meiming;BUYINQIQIGE;HASI(School of Information and Media,Inner Mongolia Technical College Of Construction,Hohhot 010070,China;Business School,Hohhot Minzu College,Hohhot 010051,China;School of Computer and Information Technology,Hohhot Minzu College,Hohhot 010051,China)

机构地区:[1]内蒙古建筑职业技术学院信息与传媒学院,内蒙古呼和浩特010070 [2]呼和浩特民族学院商学院,内蒙古呼和浩特010051 [3]呼和浩特民族学院计算机与信息技术学院,内蒙古呼和浩特010051

出  处:《中央民族大学学报(自然科学版)》2024年第4期39-49,共11页Journal of Minzu University of China(Natural Sciences Edition)

基  金:国家自然科学基金(62166017);国家社会科学基金(24BYY062)。

摘  要:蒙古语■(五畜)指马、牛、骆驼、绵羊和山羊。蒙古语中的五畜相关的专有词汇十分丰富且具有鲜明的民族特色,构成了一个完整的语义场。在蒙古文信息处理领域,不仅缺乏与百度百科相似的搜索引擎,而且对知识图谱技术的应用也较为匮乏。该研究以结构化的蒙古语五畜数据为主要来源,抽取五畜知识的实体、属性和关系,构建蒙汉双语五畜领域知识图谱;随后基于此知识图谱,对所输入的蒙汉自然语言问句,使用AC自动机算法及文本相似度算法,来优化五畜实体识别操作;通过多分类模型确定查询意图,使用收集的蒙汉双语疑问词及问句作为分类模型的输入,提高句义分析准确率,分别使用TF-IDF和Embeding特征,建立线性SVM、非线性SVM、Naive Bayes Model、Logistic Regression、Random Forest、XGBoost、LightGBM等分类模型。实验表明,使用Embeding特征建立的线性SVM分类模型效果最好;最后基于知识图谱实现了蒙汉双语自动问答,通过问句理解、语义匹配和答案检索等操作实现问答系统,并随机抽取五种类别的蒙汉双语问答对各300条来测试系统性能,平均正确率为89.2%,研究结果为蒙古语五畜知识传播、疾病问答等提供了强有力的支撑。(five livestock)in Mongolian refers to horses,cattle,camels,sheep,and goats.In Mongolian,the proprietary vocabulary related to the Five Livestock is very rich and color-ful,with distinct ethnic characteristics.For example,the category,gender,age,hair color and temperament of the five livestock have detailed names.There are many common names or dialect lo-cal words.The five-livestock words form a complete semantic field.However,in the field of Mongo-lian information processing,it lacks search engines similar to Baidu Encyclopedia and the applica-tion of the knowledge graph technology.This paper uses structured Mongolian five animal data as the main source to extract the entities,attributes and relationships of five animal knowledge to construct a Mongolian-Chinese bilingual five animal domain Knowledge Graph,then based on this Knowledge Graph,optimize entity recognition operations using a combination of AC automata algorithm and text similarity algorithm.Through the multi-classification model determines query intention,using col-lected Mongolian Chinese bilingual question words and sentences as input for the classification mod-el,to improve the accuracy of sentence meaning analysis.TF-IDF and Embedding features are used to build classification models,they include Linear SVM,Nonlinear SVM,Naive Bayes Model,Lo-gistic Regression,Random Forest,XGBoost and LightCBM.The experiments show that the Linear SVM classification model established by embedding features has the best classification effect.We complete Mongolian and Chinese bilingual automatic question answering based on the knowledge graph,and realize the question answering system through question sentence understanding,semantic matching,answer retrieval,and other operations.We randomly select 300 Mongolian and Chinese bilingual Q&A pairs from five categories to test the system performance,the average accuracy is 89.2%.It provides a strong platform for the dissemination of Mongolian five livestock knowledge and disease questions and answers.

关 键 词:领域知识图谱 问答系统 蒙古五畜 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象